- 17 views
- 48 downloads
Exploring Methods for Generating and Evaluating Skill Targeted Reading Comprehension Questions
-
- Author / Creator
- von der Ohe, Spencer McIntosh
-
It takes skilled teachers a significant amount of time and effort to create high
quality reading comprehension questions, often making it impractical to target
a particular reader’s weaknesses. Recently, language models have been
proposed as a tool to help teachers fill this gap, allowing these teachers to
generate questions targeting specific skill types.
In this thesis, we propose SoftSkillQG, a new soft-prompt based language
model for generating skill targeted reading comprehension questions that does
not require any manual effort to target new skills. We compare SoftSkillQG
against a variety of strong baselines and show that it outperforms existing
techniques on four out of five question quality metrics for the SBRCS dataset
and human evaluation of Context Specificity on the QuAIL dataset. However,
on the QuAIL dataset, T5 WTA, a previously proposed method using
manually created prompts, outperforms SoftSkillQG in terms of perplexity
and these same five metrics.
We investigate why SoftSkillQG performs poorly relative to T5 WTA, a
method using manually created “hard” prompts, on the QuAIL dataset by
examining both the data size and prompt initialization on SoftSkillQG’s performance.
We show that dataset size may be affecting performance, but augmenting
training with silver data from the SQuAD dataset did not improve
performance. On the other hand, initializing the prompt of SoftSkillQG using
the same prompt as T5 WTA yielded nearly the same perplexity on the QuAIL
dataset.
Finally, we perform a first of its kind analysis using the human annotations
from our previous experiments to compare five different methods for evaluating
sets of generated questions. We find that: MS-Jaccard4 best captures the
diversity of a set of questions, Best Reference Evaluation aligns mostly
closely with human judgement of Answerability; Cartesian Product evaluation
aligns most closely with Context-Specificity; and Fr´echet BERT Distance
aligns mostly closely with Fluency. -
- Graduation date
- Spring 2024
-
- Type of Item
- Thesis
-
- Degree
- Master of Science
-
- License
- This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.