Comparison of vertical scaling methods in the context of NCLB

Gotzmann, Andrea Julie

doi:doi:10.7939/R3764M

This decommissioned ERA site remains active temporarily to support our final migration steps to https://ualberta.scholaris.ca, ERA's new home. All new collections and items, including Spring 2025 theses, are at that site. For assistance, please contact erahelp@ualberta.ca.

View

Download

Communities and Collections

Graduate and Postdoctoral Studies (GPS), Faculty of / Theses and Dissertations

Usage

352 views
365 downloads

Comparison of vertical scaling methods in the context of NCLB

Author / Creator

Gotzmann, Andrea Julie
Vertical scaling is the process of establishing a numerical test score scale across several age or grade levels. Given that the current literature does not indicate which of the different vertical scaling procedure works “best” for all situations. This study evaluated the performance of four vertical scaling procedures (concurrent calibration, fixed common item parameters, test characteristic curve, and hybrid characteristic curve), across two content areas (Reading and Mathematics), two score distribution types (normal and negatively skewed), and two sample sizes (1,500 and 3,000). Five outcome measures were used to evaluate the results: decision accuracy, decision consistency, conditional standard errors at each of two cut-scores, root-mean-squared-differences of the scale scores between scaling procedures, and correlations between scaling procedures’ final item parameters. The data used in this study was from a U.S. large scale testing program in Reading and Mathematics for grades 3 through 8. These data were used to simulate the type of score distribution and sample sizes considered with 100 replicates for these combinations.

The largest differences among the four vertical scaling procedures for Reading were found at the lower and upper grade levels, particularly for decision accuracy. Differences were found between the normal and skewed distributions, for decision accuracy where a different pattern of results were found. The accuracy results decreased markedly as grades increased for the skewed distribution. For Mathematics the largest differences across all outcome measures occurred across grade levels rather than across vertical scaling procedures. Sample size for both Reading and Mathematics did not seem to have an effect.

Practitioners should ensure high decision accuracy and consistency values across all grade levels, and that a particular scaling procedure does not result in undesirable results. If a state program allows different procedures for different content areas, then the hybrid characteristic curve procedure would be most appropriate for Reading and the test characteristic procedure most appropriate for Mathematics. However, if the procedure must be the same, then the hybrid characteristic curve procedure could be used for both Reading and Mathematics. Measurement specialists can use these results to guide their implementation of vertical scaling for their state assessment programs.
Subjects / Keywords
Graduation date

Fall 2011
Type of Item

Thesis
Degree

Doctor of Philosophy
DOI

https://doi.org/10.7939/R3764M
License

This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.

Language

English
Institution

University of Alberta
Degree level

Doctoral
Department
- Department of Educational Psychology
Supervisor / co-supervisor and their department(s)
- Gierl, Mark J. (Educational Psychology)
- Rogers, W. Todd (Educational Psychology)
Examining committee members and their departments
- Hayduk, Les (Sociology)
- Abbott, Marilyn (Educational Psychology)
- Childs, Ruth (Human Development and Applied Psychology)
- Cui, Ying (Educational Psychology)