Download Inter-rater Reliability | PCC Spaces and more Exams Project Management in PDF only on Docsity!
challenges affecting LAC assessment projects
Adequate levels ensure
‣ accuracy
‣ consistency
in the assessment.
Purposes
Inadequate levels
indicate
‣ scale inadequacy
‣ need for additional
rater training
A numerical estimate/measure of the degree of agreement
among raters
ARTIFACTS RATER 1 RATER 2 AGREEMENT
The basic model for calculating inter-rater
reliability is percent agreement in the
two-rater model.
1. Calculate the number of ratings that
are in agreement
A numerical estimate/measure of the degree of agreement
among raters
The basic model for calculating inter-rater
reliability is percent agreement in the
two-rater model.
1. Calculate the number/rate of ratings
that are in agreement
2. Calculate the total number of ratings
ARTIFACTS RATER 1 RATER 2 AGREEMENT
A numerical estimate/measure of the degree of agreement
among raters
The basic model for calculating inter-rater
reliability is percent agreement in the
two-rater model.
1. Calculate the number/rate of ratings
that are in agreement
2. Calculate the total number of ratings
3. Convert the fraction to a percentage
ARTIFACTS RATER 1 RATER 2 AGREEMENT
Percent Agreement = 60%
What does this mean? How do we
interpret the numbers?
benchmarking inter-rater reliability
Rules-of-Thumb for Percent Agreement
Number of Ratings High Agreement Minimal Agreement Qualifications
4 or fewer categories 90% 75%
No ratings more than one level apart 5-7 categories 75%
Approximately 90% of ratings identical or adjacent
generalizing the percent agreement calculation
ARTIFACTS RATER 1 RATER 2 RATER 3 AGREEMENT
Calculating the generalized (more than two raters) percent agreement statistic
is less intuitive than for the two-rater case.
generalizing the percent agreement calculation
ARTIFACTS RATER 1 RATER 2 RATER 3 AGREEMENT
Calculating the generalized (more than two raters) percent agreement statistic
is less intuitive than for the two-rater case.
- Many assume that since 2 of 3 ratings are identical, the percent agreement
for this artifact is 2/3 or 66%.
generalizing the percent agreement calculation
ARTIFACTS RATER
RATER 2 RATER 3 AGREEMENT
Percent Agreement = 66% - even though only “3/18 ratings differ.”
generalizing the percent
agreement calculation
ARTIFACTS R 1 R2 R 3 AGREE
Percent Agreement = 66% - even though only “3/18 ratings differ.”
Rules-of-Thumb for Percent Agreement
Number of Ratings High Agreement Minimal Agreement Qualifications
4 or fewer categories 90% 75%
No ratings more than one level apart 5-7 categories 75%
Approximately 90% of ratings identical or adjacent
This is an inadequate level of
agreement.
Absolute agreement is an unforgiving standard – a
common solution is to count adjacent ratings as being
in-agreement.
problems with the percent agreement statistic
Absolute agreement is an unforgiving standard – a
common solution is to count adjacent ratings as being
in-agreement.
problems with the percent agreement statistic
ARTIFACTS RATER
RATER
AGREE
Counting adjacent ratings as in-agreement
turns this percent agreement = 0
This adjustment can be extremely problematic when benchmarks
(the just-barely-passing standard) have been identified. As in this
case: complete disagreement about each artifact’s pass/fail
status results in a determination of ‘perfect agreement’.
problems with the percent agreement statistic
ARTIFACTS RATER
RATER
AGREE
Counting adjacent ratings as in-agreement
turns this percent agreement = 0 into a
percent agreement = 100%
The percent agreement statistic does not take chance
agreement into account – over-estimating the inter-
rater reliability estimate.
problems with the percent agreement statistic