Usage
  • 314 views
  • 186 downloads

Methods for determining whether subscore reporting is warranted in large-scale achievement assessments

  • Author / Creator
    Babenko, Oksana Illivna
  • Officials of large-scale assessment programs often want to report subscale scores in addition to the total test score. However, in addition to the reliability of reported scores, evidence that subscales reveal real differences in student performances must be obtained in order to support reporting of subscale scores. In this study, two correlational methods, including correlations corrected for attenuation, r’, and the proportional reduction of the mean squared error, PRMSE (Haberman, 2005; Sinharay et al., 2007), and the agreement method (Kelley, 1923) for determining whether subscore reporting is warranted in large-scale achievement assessments were examined. Whereas correlation-based methods consider student performances on pairs of measures in terms of ranked positions, the agreement method takes into account actual differences between students’ standard scores on the pairs of measures being compared. The correlational methods revealed that with one possible subscale difference, the subscales did not differ among themselves and from the total test for the English Reading (N = 128,089) and Mathematics (N = 127,596) assessments considered in this study. In contrast, Kelley’s agreement method one to five percent students had differences between their scores on the English Reading subscales that were greater than the difference expected due to the chance. However, with two exceptions for the Mathematics assessment, the results of the agreement method were uninterpretable. In agreement with Sinharay, et al. (2007), it was concluded that for the detection methods to work, three conditions need to be met, one substantive (multidimensional construct for which scores are wanted for each dimension), and two statistical (high reliabilities of and low intercorrelations among subscales). The results for replicated random samples (n = 250, 500, 1,000, 2,000, and 5,000) revealed that the statistics for the three detection methods were accurate and precise estimators of the corresponding population parameters.

  • Subjects / Keywords
  • Graduation date
    Fall 2011
  • Type of Item
    Thesis
  • Degree
    Doctor of Philosophy
  • DOI
    https://doi.org/10.7939/R3QP9P
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.