Development and Validation of an Automated Essay Scoring Framework by Integrating Deep Features of English Language

  • Author / Creator
    Latifi, Syed Muhammad Fahad
  • Automated scoring methods have become an important topic for the assessments of 21st century skills. Recent development in computational linguistics and natural language processing has given rise to more rational based methods for the extraction and modeling of language features. The language features from Coh-Metrix are based on theoretical and empirical foundations from psycholinguistics, discourse processing, corpus linguistics, and computing science. The primary purpose of this research was to study the effectiveness of Coh-Metrix features for the development and validation of three-staged automated essay scoring (AES) framework, using essay samples that were collected in a standardized testing situation. A second purpose of this study was to evaluate: 1) the scoring concordance and discrepancy between an AES framework and gold-standard, 2) features informedness as a function of dimensionality reduction, 3) two distinct machine learning methods, and 4) the scoring performance relative to human raters and current state-of-the-art in AES. This study was conducted using the methods and processes from data sciences, however, the foundational methodology comes from the field of machine learning and natural language processing. Moreover, the human raters were considered the “gold standard” and, hence, the validation process relies primarily on the evaluation of scores produced by the AES framework with the scores produced by the human raters. The finding from this study clearly suggests the value and effectiveness of Coh-Metrix features for the development of automated scoring framework. The measures of concordance confirm that the features which were used for the development of scoring models had reliably captured the construct of writing quality, and no systematic pattern of discrepancy was found in the machine scoring. However, the studied features had varying degree of informedness across essay types and the ensemble-based machine learning consistently performed better. On aggregate, the AES framework was found superior than the studied state-of-the-art in machine scoring. Finally, the limitations of this study were described and the directions of future research were discussed.

  • Subjects / Keywords
  • Graduation date
    Fall 2016
  • Type of Item
  • Degree
    Doctor of Philosophy
  • DOI
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.
  • Language
  • Institution
    University of Alberta
  • Degree level
  • Department
  • Specialization
    • Measurement, Evaluation and Cognition
  • Supervisor / co-supervisor and their department(s)
  • Examining committee members and their departments
    • Gierl, Mark (Educational Psychology)
    • Lai, Hollis (Faculty of Medicine)
    • Cormier, Damien (Educational Psychology)
    • Bulut, Okan (Educational Psychology)