Comparing Students' Scores on Same Test: Large-Scale Assessments | Summaries Mathematics

Comparability of Individual Students’

Scores on the “Same Test”

Charles DePascale and Brian Gong,

National Center for the Improvement of Educational Assessment

CONTENTS

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Interpreting an Individual Test Score: Comparability and Validity . . . . . . . . . . . .26

Purposes for Comparing Test Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26

The Same Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27

APPROACHES TO PRODUCING COMPARABLE TEST SCORES . . . . . . . . . . . . . . .29

Design Approaches to Producing Comparable Test Scores . . . . . . . . . . . . . . . . . . .30

Psychometric Approaches to Producing Comparable Test Scores . . . . . . . . . . . . .33

THREATS TO COMPARABILITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38

Opportunity to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .38

Familiarity with Item Formats and Tools Used on the Assessment . . . . . . . . . . . .41

Differences in Intended Uses of Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

Differences Among Assessment Contractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

INTRODUCTION

In large-scale assessments, individual student test scores on the same test are expected

to be comparable, but meeting this goal is challenging . The challenge is exacerbated in

large-scale K–12 testing because the term “same test” refers to various cases in which stu-

dents may take different sets of items under different conditions . This chapter addresses

how to evaluate whether comparability across conditions is sufficient to support a par-

ticular inference or test use . Common threats to comparability arise from a lack of atten-

tion to design decisions and psychometric procedures . There are also external threats that

might affect the accuracy and/or interpretation of students’ scores . Students’ opportunity

to learn (OTL) the content assessed and familiarity with the item formats and tools used

on the assessment are two types of comparability threats related primarily to their prior

experiences . The process of establishing the comparability of individual students’ scores

Comparing Students' Scores on Same Test: Large-Scale Assessments, Summaries of Mathematics