Comparing the correctness of classical test theory  and item response theory in evaluating the consistency and accurancy of student proficiency classifications

Gundula, Augustine M

doi:doi:10.7939/R39M17

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

396 views
506 downloads

Comparing the correctness of classical test theory and item response theory in evaluating the consistency and accurancy of student proficiency classifications

Author / Creator

Gundula, Augustine M
The purposes of this study were: 1) to compare the values of decision consistency (DC) and decision accuracy (DA) yielded by three commonly used estimation procedures: Livingston-Lewis (LL) and the compound multinomial procedure (CM) procedures, both of which are based on classical test theory approach, and Lee’s IRT procedure based on item response theory approach and 2) to determine how accurate and precise these procedures are. Two population data sources were used: the Junior Reading (N = 128,103) and Mathematics (N = 127,639) assessments administered by the Education Quality and Accountability Office (EQAO) and the three entrance examinations administered by the University of Malawi (U of M; N = 6,191). To determine the degree of bias and the level of precision for both DC and DA, 100 replicated random samples corresponding to four sample sizes (n = 1,500, 3,000, 4,500, 6,000) for the EQAO populations and two sample sizes (n = 1,500, 3,000) for the U of M population were selected.
At the population level, there was an interaction between the three procedures and the four cut-scores. While the differences between the values of DC and the values of DA among the three procedures tended to be small for one or both extreme cut-scores, the differences tended to be larger when the cut-score was closer to the population mean. The IRT procedure tended to provide the highest values for both DC and DA, followed in turn by the CM and LL procedures.

At the sample level, the estimates of DC and DA yielded by the three estimation procedures were unbiased and precise. Consequently, the findings at the population are applicable at the sample level. Therefore, based on the findings of the present study, the compound multinomial procedure should be used to determine DC and DA when classical test score theory is used to analyze a test and its items and the IRT procedure should be used to determine DC and DA when item response theory is used to analyze a test and its items.
Subjects / Keywords
- Evaluating decision consistency and accuracy
Graduation date

Fall 2012
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R39M17
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Educational Psychology
Specialization
- Measurement, Evaluation and Cognition
Supervisor / co-supervisor and their department(s)
- Buck, George (Educational Psychology)
- Rogers, Todd (Educational Psychology)
Examining committee members and their departments
- Whelton, William (Educational Psychology)
- Plake, Barbara (University of Nebraska-Lincoln)
- Bouffard, Marcel (Phys Ed and Rec Faculty)
- Pertersen, Stewart (Phys Ed and Rec Faculty)