


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Validity, Types of Validity, Content Validity, Criterion Related Validity, Concurrent Validity, Predictive Validity, Construct Validity, Attenuation Theory, Dichotomous Prediction are learning points of this lecture.
Typology: Study notes
1 / 34
This page cannot be seen from the preview
Don't miss anything!



























Ch. 5. Validity
I. Definition: The extend to which a test measures what it is supposed to measure.
II. Types of validity A. Content validity: assessing the content of a test using non-mathematical methods to check if the behaviors sampled by the test are a representative sample of the trait being measured.
B. Criterion-related validity: correlation between test scores (X) and criterion scores (Y).
III. Topics concerning the criterion-related validity A. Attenuation theory
ρXY ρTxTy = ─────── XX ' YY ' where, ρTxTy: validity of test X with criterion Y assuming no measurement error in both tests, ρXY: observed validity test X for criterion Y, ρXX': reliability of test X, and ρYY': reliability of test Y.
B. Confidence interval for a criterion (Y)
X X Y s
s Y r i X
Y i XY
If the regression assumptions are met, CI = Y ˆ ± zα/2sY.X where sY.X = sY 1 r^2 XY
C. Dichotomous prediction
Total correct predictions
B. Item difficulty
pi = ----------------------------------------------------- total # of persons taking the item
an item with the p-value of 0 or 1 is useless.
If pi = .5, the variance of item score, piqi is maximized, but if inter-item correlation is 1, the test will classify examinees only into two groups.
Items with range of .3 - .7 and average pi = .5 will be ideal. C. Item discrimination index
di = Ui/niU - Li/niL where, Ui: # of people in the upper group who have the item i correct, niU: # of people in the upper group, Li: # of people in the lower group who have the item i correct, and niL: # of people in the lower group.
In many cases, the upper group and the lower group are equal in sample size, thus, di = (Ui - Li)/ni
di is the difference between the proportion of high-scoring examinees who get the item correct and the proportion of low-scoring examinees who get the item correct.
niU and niL range between 10% and 33%. If the test scores are normally distributed, 27% is optimum.
the range of di is -1 to +1. If di is negative, the item should be discarded. D. Item-total correlation (point-biserial or biserial)
i
i X
i iX (^) p
p s
r 1 where Xi : the mean of the test score for those who have item i correctly, X and sX are the mean and SD of all examinees, and pi is item difficulty index.
1
k
i
X pi
2
1
1
2
' 1 1 k
i
iiX
k
i
i XX s r
s
k
k r
k
i
iiX
k
i
iiY XY s r
sr r
1
F. Item-Characteristic Curves
1
iX
k
i
sX sir
Ch. 2: Concepts, Assumptions, and Models
I. Concepts and Features A. An examinee’s performance on a test is a monotonically increasing function of a set of latent traits or abilities. B. The item characteristic curve (ICC) describes the function between an examinee’s performance on a test item and the latent trait. C. The IRT model was adopted from psychophysics (threshold) and biology (lethal dose). OHP. D. The IRT models are falsifiable models. E. The IRT models offer the possibility of computing invariant item and person parameters.
II. Assumptions A. Unidimensionality of the latent space
III. IRT models A. Two mathematical Models
B. Normal Ogive Models: by Lord (1952)
a) Pi( ) = e dz
b i z 2
2
2
where Pi( ): The probability of getting item i correct for given , : Latent trait (ability or proficiency), bi: Item difficulty parameter, and
z:
b) The probability of getting an item(i) correct is a function of ability ( ) and item difficulty parameter (bi) only.
a) Pi( ) = e dz
ai b i ( ) z 2
2
2
where ai: item discrimnation parameter. b) Pi( ) is a function of ai and bi in addition to.
a) Pi( ) = c c e dz
ai bi z i i
( ) 2
2
2
where ci: Pseudo-chance parameter. b) Pi( ) is a function of ai, bi, ci, and.
C. Logistic Models (Birnbaum, 1968)
a) Pi( ) = ( )
( )
1 i
i D b
D b
e
e = ( ) 1
(^) eD b i = [1 + ^ e D (^ bi )]-1.
*Original Rasch Model: Pi( ) =
b
(later).
b) D=1, a scaling factor for the N.O.M. c) Pi( ) is a function of bi.
a) Pi( ) = ( )
( )
1 i i
i i Da b
Da b
e
e = ( ) 1
(^) eDai b i.
b) Pi( ) is a function of ai and bi in addition to. c) D=1.7, a scaling factor for the N.O.M.
a) Pi( ) = ( )
( )
1
( 1 ) i i
i i Da b
Da b i i e
e c c = ( ) 1
Dai b i
i i e
c c.
b) Pi( ) is a function of ai, bi, ci, and.
e.g. For u ’ = [1, 1, 0], we can have L = P 1 *P 2 *Q 3 If P 1 = .50, P 2 = .40, and P 3 = .30, then, L = (.50)(.40)(.70) II. Ability Estimation A. For a given set of item responses, the main job of a psychometrician is to estimate the examinee’s true ability using the likelihood function of the response pattern. B. There are several estimation methods depending on the algorithm applied (MLE, BME, EAP, WLE). C. MLE
l’ = ln L = i i i
i i i P Q
P '( u P )
where
i i i i
i i P c Q c
P Da P )( ) 1
*Bayes’ Rule
P(Ai|B) = ( )
P Ai B
(binary)
i i
i i
P B A P A
(polytomous)
PB AP AdA
P B Ai P Ai
( | ) ( )
(continuous)
Example: What is the probability that your house is on fire when the fire alarm goes off? A: your house is on fire. B: The fire alarm goes off. P(A): The prior probability, the marginal probability that a randomly chosen house is on fire (.0001), one out of 10K houses. P( A ): 1 – P(A) = .9999. P(B|A) = Hit ratio (.98). P(B| A ) = False alarm (.01).
D. Bayesian Methods
i i
i i
P B A P A
(polytomous)
PB AP AdA
P B Ai P Ai
( | ) ( )
(continuous)
where P(A|B) = posterior probability of A given B, and P(A) = prior probability of A, marginal probability of A.
P( | u ) = Lu d
Lu
( | ) ( )
where
L( u | ) =
n
i
ui i
ui Pi Q 1
1 , conditional probability of u given ,
( ) = the prior distribution of [e.g., (0, 1)], and P( | u ) = posterior probability of given u.
n
i
p X xi xi 1
(^) ( ) = p ( x ) xdx
III. Item parameter estimation A. Likelihood function
L L( u | , a, b, c) =
n
i
u i
u i Pi^ Q i 1
1
B. Log likelihood function l = lnL = [uilnPi + (1-ui)lnQi]. C. Compute the first derivative of l with regard to each item parameter. D. Set each of the derivatives to zero and simultaneously solve the equations with 3 unknowns. E. The multivariate Newton-Raphson method will be used for each item.
F. Local independence is not required since we are estimating each item. Instead the independence of examinees’ responses is required. IV. Joint estimation of the item and ability parameters A. Joint MLE
V. Item information function and standard error.
A. I( , ui ) i i
i P Q
2 2
[ ][ 1 ]
Dai bi Dai b i i
i i c e e
( | )
Var
Inverse of the variance of
given.
B. SE(
Var.
Ch. 4: Model-Data Fit I. Introduction A. IRT has different models (1, 2, 3 plm, and uni- and multidimensional models). B. If a model does not fit the data, IRT will lose its advantages over CTT. C. Three methods of checking the model-data fit.
II. Assumption checking A. Unidimensionality checking
III. Invariance checking A. Checking ability parameter ( )
Ch. 5: Ability Scale I. Introduction A. In CTT, X (number-right-score) is an unbiased estimate of a person’s true score T, E (X) T. B. In IRT, a person’s ability, , is monotonically related to the person’s true score. Monotonic relationship is a non-linear and strictly increasing relationship. C. General procedure in estimating the true ability of a person.
can be reported.
II. Nature of ability scale A. In CTT,
III. Transformation of A. Linear transformation of , b, and a. Let * = , b* = b , and a* = a / , then 3-pl is,
P( *) = c + / [( )( )] 1
eDa b
c
= c + / [ ] 1
e Da b
c
= c + / .[ ] 1
eDa b
c
= c + ( ) 1
eDa b
c = P( ).
P( ) is invariant under the linear transformation of , b, and a. Indeterminancy. (Woodcock-Johnson’s scale p. 80). B. Non-linear transformation of and b: partial ratio-scale interpretation only for the Rasch model.
( )
1 D b
D b
e
e = D Db
D Db
e
e 1
= D Db
D Db
e e
e e 1 /
= Db Db D Db
D Db
e e e e
e e / /
= Db D
D
e e
b P( *) =
b
(The first Rasch model)
If * = b*, then P( *) = .5. Q( *) = 1 – P( *)
= 1 -
b
b
b
b
O = ( *)
b b
b
Given Op1 =
b
, and Op2 =
b
2
1 2
1 p
p O
If 1 * 2 2 *, then examinee 1 has twice the odds of examinee 2 in answering the item correctly (ratio-scale property).
Oi1 =
b 1
, Oi2 =
b 2
, then
1
2 2
1 b
b O
i
i.
If b 2 * = 2b 1 *, then item 1 has twice the odds of item 2 for an examinee to get the item correct (ratio-scale property).
( 1 2 ) 2
1
2
1 2
1
D
D
p
p (^) e e
e O
ln ( 1 2 ) 2
p
p.
In 1-pl model, D is omitted (D=1.0).
ln 1 2 2
1 p
p O
In the same way.
ln 2 1 2
(^1) b b O
i
i.
Pi( ) = ( )
( )
1 b
b
e
e , and Qi( ) = ( ) 1
(^) e b.
Ch. 6: Item and Test Information.
I. Item information function
A. Ii( ) = ( ) ( )
2 ' 2
i i
i i i i
i
i i
i P Q
Q P c c
a
Var
where Pi’( ) = the first derivative of Pi( ), Pi( ) = item response function, and Qi( ) = 1 – Pi( ). B. Lord (1980) and Birnbaum (1968) showed that
Ii( ) = 1. 7 ( ) 1. 7 ( ) 2
2 2
[ ][ 1 ]
ai bi ai b i i
i i c e e
a c .
C. Item’s maximum information is obtained when is maximized.
ln[. 5 ( 1 1 8 )]
max i i
bi (^) a c
Assuming a=1,
A. I( ) =
n
i
Ii 1
n
i (^) i i
i P Q
1
' 2 .
B. Contribution of each item is independent of the contribution of other items. Unique feature of IRT. C. Standard error of estimation, SE(
with MLE.
) varies with , a, b, and c.
) is normal when the test is long.
) .20 or .25 (I>25) or n 40.
III. Relative efficiency
A. RE( ) = ( )
B
A I
B. RE( ) can be treated as a ratio-scale assuming that items have comparable statistical quality.
Ch. 7: Test Construction
I. Introduction A. Test construction is a process of item selection from an item bank for a specified purpose. B. In CTT item and test indices are variant over groups.
II. Procedure (Lord, 1977) A. Develop a target information function for a test (p. 104, f.7.1).
(e.g., .50, .33, .25).
B. Select items from the item bank to match the target information function. Pay special attention to the tails of the ability distribution. C. After each item is added to the test, calculate the test information function. D. Continue the item selection procedure until the test information function is similar to the target information function. E. We may set a criterion for the number of items for the test as an alternate for the