Usage
  • 117 views
  • 152 downloads

A Generative Transformer-Based Approach to Automated Essay Scoring: Evaluating GPT-2’s Performance with Pre- and Post- Data Augmentation

  • Author / Creator
    Gunduz, Aysegul
  • Recent advancements in artificial intelligence and language modeling have revolutionized the domain of educational technology, with a special focus on the utility of automated essay scoring (AES) systems. The potential of GPT-based model architectures, including different versions or iterations of the ChatGPT tool, has become an important research topic. My research is designed to investigate the performance of the GPT-2 small model in AES and examine how the back translation technique between English and Turkish can improve its performance on the Hewlett-sponsored ASAP dataset (https://www.kaggle.com/c/asap-aes). The evaluation is based on both Cohen's kappa and Quadratic Weighted Kappa (QWK) for agreement reliability, with additional metrics such as accuracy, precision, sensitivity, and the F-1 score providing further insight into the classification accuracy. Findings indicate a QWK range of 0.60 to 0.80 across most ASAP essay sets, with Essay Set 5 reaching a peak QWK of 0.77. Back Translation techniques showed a significant increase in the model's performance, especially in Essay Set 8, where there was a QWK score increase of 33%. The study highlights the limited capacity of GPT-2 small model and emphasizes the importance of conducting future research with more advanced GPT versions. It also underscores the importance of balanced class distributions to achieve high QWK scores, where the use of balanced essay sets is recommended for future research to enhance AES performance.
    Key words: artificial intelligence, language modeling, automated essay scoring (AES), GPT-based models, ChatGPT, GPT-2, data augmentation, back translation, ASAP

  • Subjects / Keywords
  • Graduation date
    Fall 2024
  • Type of Item
    Thesis
  • Degree
    Master of Education
  • DOI
    https://doi.org/10.7939/r3-xj4c-h371
  • License
    This thesis is made available by the University of Alberta Library with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.