Item Restricted to University of Alberta Users

Log In with CCID to View Item

The Comparability of Standardized Paper-and-Pencil and Computer-based Mathematics Tests in Alberta

  • Author(s) / Creator(s)
  • This mixed-methods study examines the relationship between of the mode of test administration and the student test scores for a standardized mathematics exam administered in the province of Alberta. To determine how the results of the paper-and-pencil version of the 2013 Mathematics 9 Provincial Achievement Exam compare to the results of the identical computer-based version, the overall mean test scores, individual item difficulty values and item discrimination values from each version of the exam were compared. Using a significance level of .05, a two-tailed T-test for independent measures determined that the mean test score of the digital version of the standardized exam is significantly higher than the mean test score for students that completed the paper-and-pencil version. A Z-score analysis comparing the proportions of test-takers that answered each question correctly from each version of the exam determined that 21 items have significantly different item difficulties between the two test modes. Fisher’s r to z transformation calculations identified 2 items that have significantly different item discrimination values between the test modes. One item was identified as having performance differences in both the Z-score and Fisher’s r to z calculations. The number of questions identified for having significant differences between the paper-and-pencil test and the computer-based exam indicate that there may be a relationship between the mode of test administration and the difficulty of individual items. The ability of an item to discriminate between students with different ability levels in mathematics does not appear to be impacted by the mode of test administration. However, as demographic information about test participants was not collected as part of this project, definitive conclusions about the relationship between the mean test scores, item difficulty and item discrimination values and the mode of administration cannot be made. In an effort to determine patterns in the items that exhibit differences in item statistics, questions were categorized by the content domain, the cognitive domain, the structural components that may impact how test-takers view the item and the mathematical processes required to answer the question. The identified items represent all four content domains almost equally but a substantial portion of the items in the moderate complexity category (in the cognitive domain) exhibit performance differences. Items that require multiple arithmetic calculations, contain complex diagrams, or diagrams with missing measurements have been identified for having differences in item statistics more frequently than items that involve geometric manipulations or graphing on the Cartesian plane. However, more research is needed to better understand the relationship between the mode of administration and the performance of questions especially if items contain longer reading passages or combine multiple mathematical procedures. The results of the study also indicate that future comparative research studies need to examine if test-takers modify the strategies they use to solve questions on digital mathematics assessments to determine if a change in problem-solving strategies impact overall test scores or individual item statistics.

  • Date created
  • Subjects / Keywords
  • Type of Item
    Research Material
  • DOI
  • License
    Attribution-NonCommercial 4.0 International