Polytomous IRT Models - Basic Statistics for Behavioral Sciences - Lecture Notes, Study notes of Statistics for Psychologists

Polytomous IRT Models, Dichotomous IRT, Difference Models, Graded Response Model, Operating Characteristic Curve, Category Response Curve, Direct Models, Sufficient Statistics are learning points of this lecture.

Typology: Study notes

2011/2012

Uploaded on 11/21/2012

ashakiran
ashakiran 🇮🇳

4.5

(27)

261 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
39
Ch. 5. Polytomous IRT Models (Embretson & Reese)
I. Introduction
A. CTT: The main concern is how a test works. Computes test
indices: reliability and validity.
B. Dichotomous IRT: The main issue is how each item works. Dichotomous IRT
computes item parameters (a-, b-, and c-parameters) and person parameter ( ).
C. Polytomous IRT: The main issue is how each alternative works. Polytomous IRT
computes slope and threshold parameters (
,
) and trait parameter ( ).
II. Different models
A. Indirect (Difference) models
1. Graded-Response Model (GRM, Samejima, 1969): Assumes ordered
categorical responses. A generalization of the 2-pl model.
a) Operating Characteristic Curve (OCC)
)](exp[1
)](exp[
)(
*
iji
iji
ix
P
where
x = j = 1, 2, . . ., mi, score,
i
= common item slope parameter, and
ij
= category threshold parameter.
b) Category Response Curve (CRC)
)()()( *)1(
*xiixix PPP
.
The probability of responding on or above the lowest category is
*
0i
P
= 1.0, and the probability of responding above the highest
category is
*
5i
P
= 0.0.
Thus,
)()()( *
2
*
11 iii PPP
)()()( *
3
*
22 iii PPP
)()()( *
4
*
33 iii PPP
)()()( *
5
*
44 iii PPP
2. Modified Graded Response Model (M-GRM, Muraki, 1990): A restricted
case of GRM in that it has a location parameter for each item (bi) and a set
of category threshold parameters (cj) for the entire scale from the threshold
parameter in GRM (
ij
).
a) OCC
))]((exp[1
))]((exp[
)(
*
jii
jii
ix cb
cb
P
Docsity.com
pf3
pf4
pf5
pf8

Partial preview of the text

Download Polytomous IRT Models - Basic Statistics for Behavioral Sciences - Lecture Notes and more Study notes Statistics for Psychologists in PDF only on Docsity!

Ch. 5. Polytomous IRT Models (Embretson & Reese)

I. Introduction

A. CTT: The main concern is how a test works. Computes test

indices: reliability and validity.

B. Dichotomous IRT: The main issue is how each item works. Dichotomous IRT

computes item parameters (a-, b-, and c-parameters) and person parameter ( ).

C. Polytomous IRT: The main issue is how each alternative works. Polytomous IRT

computes slope and threshold parameters ( , ) and trait parameter ( ).

II. Different models

A. Indirect (Difference) models

  1. Graded-Response Model (GRM, Samejima, 1969): Assumes ordered

categorical responses. A generalization of the 2-pl model.

a) Operating Characteristic Curve (OCC)

1 exp[ ( )]

exp[ ( )] ( )

i ij

i ij Pix

where

x = j = 1, 2,.. ., mi, score,

i^ = common item slope parameter, and

ij^ = category threshold parameter.

b) Category Response Curve (CRC)

( ) ( ) ( )

( 1 )

Pix Pix Pi x.

The probability of responding on or above the lowest category is

Pi 0 = 1.0, and the probability of responding above the highest

category is

Pi 5 = 0.0.

Thus,

( ) 1. 0 ( )

Pi 0 Pi 1

2

Pi 1 Pi 1 Pi

3

Pi 2 Pi 2 Pi

4

Pi 3 Pi 3 Pi

5

Pi 4 Pi 4 Pi

  1. Modified Graded Response Model (M-GRM, Muraki, 1990): A restricted

case of GRM in that it has a location parameter for each item (bi) and a set

of category threshold parameters (cj) for the entire scale from the threshold

parameter in GRM ( ij

a) OCC

1 exp[ ( ( ))]

exp[ ( ( ))] ( )

i i j

i i j ix b c

b c P

b) CRC

( 1 )

Pix Pix Pi x

c) In the GRM one set of category threshold parameters ( ij ) is

estimated for each item while in the M-GRM one set of category

threshold parameters (cj) is estimated for the entire scale, and one

location parameter (bi) is estimated for each item.

d) Developed to describe responses from a Likert format attitude

scale.

B. Direct (Divided-by-total) models

  1. Partial Credit Model (PCM, Masters, 1982): For item responses from

achievement tests. The CRC is directly computed by dividing an

exponential function with the sum of the exponential functions. A

generalization of the Rasch model.

a) CRC

[exp ( )]

exp[ ( )]

0 0

0 r

j

ij

m

r

x

j

ij

Pix (^) i

where 0

0

j

ij^.

b) (^) ij : Step difficulty parameter associated with a category score of j.

  1. The higher the value, the more difficult a particular step

relative to other steps within an item.

  1. The point at which two consecutive category response

curves intersect.

  1. (^) ij does not have to be ordered as in the GRM

c) Sufficient statistics

  1. The raw scale score is a sufficient statistics in estimating

examinee trait level.

  1. The examinees with the same raw score on a set of items

are estimated to have equivalent positions on the latent

trait.

d) In order to estimate the parameters, (^) ix = cix = 0 or in some

cases, the parameters for the lowest response category should be

zero.

e) One (^) ix and cix parameter must be estimated for each of the

(mi + 1) response categories within an item.

III. Number of subjects for polytomous IRT

A. Through a Monte Carlo study, Reise and Yu (1990) showed that the GRM could

be estimated as few as 250, but recommended 500.

B. Some of the threshold parameters were not well established with 350 examinees

with the current example.

Multidimensional Item Response Theory (MIRT, Reckage, 2009)

I. Situation

A. A case where the unidimensionality assumption is violated due to an internal

factor where items measure multiple factors or several external factors such as

cognitive condition, personality, and test-taking attitudes influence test

performance (external factors).

B. IRT models should include these factors in analysis.

II. Two related fields (Factor Analysis, FA, and Unidimensional IRT, UIRT)

A. Similarities

  1. Trying to reproduce the structure of latent variables.
  2. Both methods assume the ND (0, 1) for latent traits.

3 Both methods allow rotation of axes for latent traits.

  1. They are mathematically related (Takne & de Leeuw, 1987).

B. Dissimilarities

  1. FA treats the a- and c-parameters as nuisance variables while they are

important latent traits to be estimated in UIRT.

  1. FA starts with a correlation matrix, which ignores the differences

in mean and SD while UIRT starts with item scores.

  1. With FA all item information is lost.
  2. FA is a data reduction technique while UIRT is a model developing

technique for the interaction between person and item.

III. Mathematical Models

A. Unidimensional IRT (UIRT)

P(Ui = 0| ) = ( ) 1

Dai b i

i i e

c c

where

P(Ui = 0| ) = the probability getting item i correct for given ,

= latent trait (ability or proficiency),

bi = item difficulty parameter,

ai = item discrimnation parameter,

ci = pseudo-chance parameter, and

D = 1.701, a scaling factor to fit NOM.

B. Multidimensional IRT (MIRT)

P(Ui = 0| (^) j ) = ( ) 1

Daij ijb ij

i i e

c c

where

P(Ui = 0| (^) j ) = the probability getting item i correct for latent trait, j, (^) j ,

j = latent trait, j (ability or proficiency),

bij = item difficulty parameter for j latent trait,

aij = item discrimnation parameter for j latent trait,

ci = pseudo-chance parameter, and

D = 1.701, a scaling factor to fit NOM.

C. Factor Analysis (FA)

x = X

where

x = a (qx1) vector of observed exogenous indicators,

X^ = a (qxn) matrix of structural coefficients,

= an (nx1) vector of exogenous concepts,

= a (qx1) vector of errors in the measurement model,

q = the number of x-variables, and

n = the number of -variables.

D. Mathematical relationship between FA and UIRT

i i

i ai ' 1

where

i = factor loading vector for item i, and

= covariance matrix of factors.

i i

i bi ' 1

, where i = the threshold of item i.

  1. Research (Bolt & Lall, 2003) shows that COM fits better than PCOM for

the data from an English usage test (a=1 for both model) and that COM

fits the PCOM data as well as the PCOM model but PCOM model does

not fit the COM data.

V. Polytomous MIRT Models (all compensatory)

A. GRM for polytomous MIRT (indirect model)

  1. Operating Characteristic Curve (OCC)

ij i

i j i

a d

a d

ij j e

e P u k '

'

  1. Category Response Curve (CRC)

( | ) ( | ) ( 1 | )

Puij k j P uij k j P uij k j

B. Partial Credit Model for polytomous MIRT (direct model)

i (^) ilk ilk

m

l

jl

ilk ilk

m

l

jl

K

r

b W

b W

ij j

e

e Pu k

0

( )

( )

1

1 ( | )

where

bilk = item difficulty parameter for item i, dimension l, and category k,

Wilk = predefined scoring weight for item i, dimension l, and category k,

k = the category score.

C. Generalized Partial Credit Model for polytomous MIRT (direct model)

i

v

u

ij iu

k

u

ij iu

K

v

va

ka

ij j

e

e Pu k

0

1

'

1

'

where

ai = vector of item discrimination parameter for item i,

iu = the threshold parameter for item i and category u, and

k = the category score.

VI. Future Directions of MIRT

A. More models will be proposed concerning the interaction between persons and

items (e.g., assessment of writing). Vectors of the a-parameter for each score

boundary rather than a single a-parameter vector for an item will be more feasible.

B. Students with different scores for dichotomous responses may have different

strategies for solution  Introduction of latent class in addition to latent skills.