
101
6
6.1 CROSS-COUNTRY ITEM STATISTICS
In order to assess the statistical properties of the items before proceeding with item re-
sponse theory (IRT) scaling (see Chapter 7), TIMSS computed a series of statistics for
every item in every country. These basic item statistics (see Figure 6.1 for an example
item) were produced by the IEA Data Processing Center. For each item, the basic dis-
play presents the number of students that responded in each country, the difficulty lev-
el (the percentage of students that answered the item correctly), and the discrimination
index (the point-biserial correlation between success on the item and a total score).1 For
multiple-choice items the display presents the percentage of students that chose each
option, including the percentage that omitted or did not reach the item, and the point-
biserial correlation between each option and the total score. For free-response items
(which could have more than one score level), the display presents the difficulty and
discrimination of each score level.
As a prelude to the main IRT scaling, the display presents some statistics from a pre-
liminary Rasch analysis, including the Rasch item difficulty for each item, the standard
error of this difficulty estimate, and an index of the goodness-of-fit of the item to the
Rasch model (Wu, 1997).
The item-analysis display presents the difficulty level of each item separately for male
and female students, and, because the TIMSS IRT scaling spans two grades at
Population 1 and Population 2, separately for lower- and upper-grade students. As a
guide to the overall statistical properties of the item, it also presents the international
item difficulty (the mean of the item difficulties across countries) and the international
item discrimination (the mean of the item discriminations).
As an aid to reviewers, the item-analysis display includes a series of “flags” signaling
the presence of one or more conditions that might indicate a problem with an item. The
following conditions are flagged:
• Item difficulty exceeds 95 percent in the sample as a whole
• Item difficulty is less than 25 percent for 4-option multiple-choice items in
the sample as a whole (20 percent for 5-option items)
1For the purpose of computing the discrimination index, the total score was the percentage of items a student
answered correctly in mathematics or science.
Item Analysis and Review
Ina V.S. Mullis
Michael O. Martin
Boston College