Item Difficulty and Response Time Prediction with Large Language Models: An Empirical Analysis of USMLE Items

Bulut, O.; Gorgun, G.; Tan, B.

doi:doi:10.7939/r3-0xjn-2446

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

ERA General Collection / Research Materials (ERA General)

Usage

266 views
218 downloads

Item Difficulty and Response Time Prediction with Large Language Models: An Empirical Analysis of USMLE Items

Author(s) / Creator(s)
This paper summarizes our methodology and results for the BEA 2024 Shared Task. This competition focused on predicting item difficulty and response time for retired multiple-choice items from the United States Medical Licensing Examination® (USMLE®). We extracted linguistic features from the item stem and response options using multiple methods, including the BiomedBERT model, FastText embeddings, and Coh-Metrix. The extracted features were combined with additional features available in item metadata (e.g., item type) to predict item difficulty and average response time. The results showed that the BiomedBERT model was the most effective in predicting item difficulty, while the fine-tuned model based on FastText word embeddings was the best model for predicting response time.
Date created

2024-06-20
Subjects / Keywords
- NLP
- LLM
- education
- USMLE
- item difficulty
- response time
Type of Item

Conference/Workshop Presentation
DOI

https://doi.org/10.7939/r3-0xjn-2446
License

Attribution-NonCommercial 4.0 International

Language
- English
Link to related item

https://aclanthology.org/2024.bea-1.44/