Using Automated Procedures to Score Written Essays in Persian:  An Application of the Multilingual BERT System

Firoozi, Tahereh

doi:doi:10.7939/r3-3eqn-b237

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

278 views
544 downloads

Using Automated Procedures to Score Written Essays in Persian: An Application of the Multilingual BERT System

Author / Creator

Firoozi, Tahereh
The automated scoring of student essays is now recognized as a significant development in both the research and practice of educational testing. The majority of the published studies on automated essay scoring (AES) focus on outcomes in English. Studies on multilingual AES—meaning languages other than English—are, by comparison, practically non-existent. The purpose of this study is to develop, describe, and evaluate the first AES system for scoring essays in the Persian language using multilingual BERT. Multilingual BERT is a transformer-based encoder model for language representation that uses an attention mechanism to learn the contextual relations between words and sentences in a text. The Persian language version of BERT was used to grade 2,000 holistically-scored essays written by non-native language learners in Iran on a scale that ranged from 1 (Elementary) to 5 (Advanced). The performance of the BERT AES model was examined against a baseline model that only included a word embedding layer (Word2Vec). The models were evaluated using four metrics: the Quadratic Weighted Kappa, the Kappa coefficient, model accuracy, and error analysis. The BERT AES model performed with high classification consistency (QWK=0.84 vs. Baseline QWK=0.75; κ= 0.93 vs. Baseline κ= 0.82). The result from the accuracy measures shows that the BERT AES model correctly scored about 73% of the total number of essays. Of those essays considered correctly classified by the BERT AES system, more than 70% in each level except for Advanced were scored the same by the human raters (i.e., true positive). Among the essays that were incorrectly classified, more than 70% in each score level—except for Advanced—were considered incorrect (i.e., true negative). Error analysis showed that each level had some overlap with the adjacent levels, with the Upper-Intermediate and the Advanced levels having the highest number of overlaps. These results demonstrate that the BERT AES model can be used with a high degree of accuracy to predict the essay scores produced by the raters in this study. The one area where the performance results were comparatively weak was at the Advanced level due to the smaller number of essays (n=238). Augmentation provides a method that can be used to solve the text data sparsity problem when using low-resource languages like Persian. To improve model performance, sentence-level data augmentation was implemented by adding 20% more data to each score level. This approach improved the classification performance of the BERT AES model (QWKPre-SLDA = 0.84 vs. QWKpost-SLDA = 0.96; κ Pre-SLDA = 0.88 vs. κ post-SLDA = 0.96) thereby demonstrating the benefits of text augmentation. The architecture and methods described in this study can be easily adapted and used to score essays written in other non-English languages, thereby supporting the application and wide-spread use of multilingual AES.
Subjects / Keywords
Graduation date

Spring 2023
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/r3-3eqn-b237
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Educational Psychology
Specialization
- Measurement, Evaluation, and Data Science
Supervisor / co-supervisor and their department(s)
- Mark J. Gierl (Educational Psychology)