Download Analysis of Learning Curves and Proportional Odds Model in Educational Psychology and more Assignments Biostatistics in PDF only on Docsity! Biostat/Stat 571 Exercise #2 Answer Key January, 2007 Question 2: TenHave and Uttal (1994) report on the results of a psychology experiment in which children were shown a map giving the location of a hidden toy, and allowed up to three chances to successfully find the object. The goal of the experiment is to compare a group of children presented with a correctly oriented map (control) to a second group that was shown a rotated map. Are the children presented with the rotated map able to compensate and find the hidden object as well as the control children? Each child repeated the test 10 times. During each test, a child will only attempt the maze a second time if they fail the first time. Similarly, a child will only attempt the maze a third time if they fail the second time. If a child is unsuccessful the third time the outcome is an overall failure. Thus the outcome, number of attempts to successfully find the object takes (Y ), takes on one of four values with Y = 4 denoting all three attempts having failed. (a) Figure (1) displays the distribution of the outcome, at each of the ten tests, by treatment group. 1 2 3 4 5 6 7 8 9 10 0 20 40 60 80 10 0 Test number P er ce nt ag e Fail Third Second First (a) Control Group (n = 46) 1 2 3 4 5 6 7 8 9 10 0 20 40 60 80 10 0 Test number P er ce nt ag e Fail Third Second First (b) Rotated Group (n = 43) Figure 1: Percentage of the number of attempts per test It is difficult to discern a “learning effect” over time for the group that were presented the correctly oriented map (control). It does not seem as though the percentage that fail decreases over time, nor that the percentage that pass 1st time increases with time. In the treatment group, that were presented with the rotated maps, it seems at though there is some evidence of a “learning effect”. In particular, the percentage that fail decreases across the 10 tests, while the percentage that pass 1st time increases. (b) Let ∆i = Yi8 − Yi1 denote the difference between the outcomes of the 8 th and 1st tests for the ith child. Consequently, a negative change indicates an improvement over time. For example, ∆ = −3 Biostat/Stat 571 Exercise #2 2 corresponds to a score of 1 on the 8th test and a score of 4 on the 1st, indicating improvement. The table below summarises these differences by treatment group. ∆ -3 -2 -1 0 1 2 3 Total Rotated 10 14 5 8 3 2 1 43 Control 5 2 13 18 5 1 2 46 From this we find that 63% (29/43) of the rotated group improved their score from the 1st to the 8th tests, with 10 children passing on the first attempt during the 8th test when they had failed all three attempts during the 1st test. In the control group, only 47% improved their score with a much larger group (42%) maintaining their score between the two tests. It is interesting to note that of the 8 children in the rotate group that maintained their score, 7 maintained a score of 4. However, in the control group of the 18 that maintained their score 16 maintained scores of 1 or 2. Neither treatment group exhibited much learning between the two tests. Due to the sparse data in the final two outcome categories, these have been collapsed into one category (although the results do not change considerably keeping the two seperate). Score test for the proportional odds assumption (provided by SAS) yields a test statistic of 14.9 on 4 degrees of freedom (p-value = 0.0048), which suggests evidence against the null hypothesis that the assumption holds. Thus we should consider fitting seperate logisitc regressions for each dichotomisation of the outcome. These are presented (along with the fit of the proportional odds model) below. I[∆i≤−3] I[∆i≤−2] I[∆i≤−1] I[∆i≤0] I[∆i≤1] Ordinal (a) Intercept(−3) -2.104 -2.240 Intercept(−2) -1.717 -1.179 Intercept(−1) -0.262 -0.252 Intercept(0) 1.588 1.285 Intercept(1) 2.662 2.238 Rotate 0.910 1.951 0.990 0.261 -0.072 1.107 (a) POM: logit(γij) = θj + RotateiβR In this model, for this formulation, we interpret the ‘Rotate’ coefficient in terms of the odds of having a “low” change (which would suggest learning) versus having a “high” change (which would suggest getting worse). For the logisitc model with outcome I[∆i≤−1], the odds of a learning effect versus no learning effect/worsening are estimated to be 2.69 times higher (exp0.99 ≈ 2.69; 95% CI (1.14, 6.39)) in the rotated group compared to the control group. This suggests that the rotated group is “more likely” to experience a learning effect (i.e. improvement in score). (c) To examine the change between Test 1 and Test 8, we could also consider modelling the score at for Test 8 as a function of the score at time 1 as well as the treatment group. Initially, an interation between the Test 1 score and the treatment group was considered. The resulting likelihood ratio test for the interaction terms, in the proportional odds model, yields a test statistic of 7.23 on 3 degrees of freedom (p-value = 0.065). Further examination of the cofficients indicated that this was completely driven by one parameter and consequently the interactions were not incorporated. Note, this parameter was associated with the group that had a Test 1 score of 3, which had the lowest cell Biostat/Stat 571 Exercise #2 5 plot( 1:10, tenhave[sample.not[i], 2:11], type = "l", ylim = c(0, 4), xlab = "Test number", ylab = "Outcome", main = paste("ID:", sample.not[i], "- not") ) } # ## #### Calculate percentages within each test; PBT = percent by test ## # pbt.rotated <- matrix( 0, ncol = 10, nrow = 4 ) # n = 43 pbt.not <- matrix( 0, ncol = 10, nrow = 4 ) # n = 46 for( i in 1:10 ) { pbt.rotated[,i] <- (table( tenhave[1:43,(i+1)] ) / 43) * 100 pbt.not[,i] <- (table( tenhave[44:89,(i+1)] ) / 46) * 100 } # ## #### Produce barplots ## # barplot( pbt.not, names = as.character(c(1:10)), xlab = "Test number", ylab = "Percentage", col = c(1,5,7,6) ) legend( 8.7, 24, c("Fail", "Third", "Second", "First" ), fill = c(6,7,5,1) ) barplot( pbt.rotated, names = as.character(c(1:10)), xlab = "Test number", ylab = "Percentage", col = c(1,5,7,6) ) legend( 0.5, 97, c("Fail", "Third", "Second", "First" ), fill = c(6,7,5,1) ) # ## ################## #### Part (b) #### ################## ## # table( tenhave$rotate, tenhave$change ) table( tenhave$test1[tenhave$rotate == 0 & tenhave$change == 0], tenhave$test8[tenhave$rotate == 0 & tenhave$change == 0] ) table( tenhave$test1[tenhave$rotate == 1 & tenhave$change == 0], tenhave$test8[tenhave$rotate == 1 & tenhave$change == 0] ) # ## #### Sparse outcome for last two categories; collapse #### - should be ok since under the PO assumption regression parameter is the same #### - only the intercepts will change ## # tenhave$change[tenhave$change == 3] <- 2 # ## #### Perform a series of logistic regressions by sequentially changing the dichotomisation #### - out1 : I_{change \le -3} #### - out2 : I_{change \le -2} #### ..... #### - out5 : I_{change \le 1} #### - out6 : I_{change \le 2} ## not used ## # tenhave$out1 <- ifelse( (tenhave$change < -2), 1, 0 ) tenhave$out2 <- ifelse( (tenhave$change < -1), 1, 0 ) tenhave$out3 <- ifelse( (tenhave$change < 0), 1, 0 ) tenhave$out4 <- ifelse( (tenhave$change < 1), 1, 0 ) tenhave$out5 <- ifelse( (tenhave$change < 2), 1, 0 ) ## tenhave$out6 <- ifelse( (tenhave$change < 3), 1, 0 ) # model.out1 <- glm( out1 ~ rotate, data = tenhave, family = binomial ) model.out2 <- glm( out2 ~ rotate, data = tenhave, family = binomial ) model.out3 <- glm( out3 ~ rotate, data = tenhave, family = binomial ) model.out4 <- glm( out4 ~ rotate, data = tenhave, family = binomial ) model.out5 <- glm( out5 ~ rotate, data = tenhave, family = binomial ) ## model.out6 <- glm( out6 ~ rotate, data = tenhave, family = binomial ) # coef.output.b <- c( summary(model.out1)$coef[1,1:2], summary(model.out1)$coef[2,1:2] ) coef.output.b <- rbind( coef.output.b, c( summary(model.out2)$coef[1,1:2], summary(model.out2)$coef[2,1:2] ) ) coef.output.b <- rbind( coef.output.b, c( summary(model.out3)$coef[1,1:2], summary(model.out3)$coef[2,1:2] ) ) coef.output.b <- rbind( coef.output.b, c( summary(model.out4)$coef[1,1:2], summary(model.out4)$coef[2,1:2] ) ) coef.output.b <- rbind( coef.output.b, c( summary(model.out5)$coef[1,1:2], summary(model.out5)$coef[2,1:2] ) ) ## coef.output.b <- rbind( coef.output.b, c( summary(model.out6)$coef[1,1:2], summary(model.out6)$coef[2,1:2] ) ) coef.output.b <- as.data.frame( round( coef.output.b, 3 ) ) names( coef.output.b ) <- c( "Intercept", "se", "rotate", "se" ) coef.output.b # ## #### Proportional Odds Model for change ## # model.POMb <- polr( factor(change) ~ rotate, data = tenhave ) Biostat/Stat 571 Exercise #2 6 summary( model.POMb ) # ## model.POMb <- lrm( factor(change) ~ rotate, data = tenhave ) ## model.POMb # ## ################## #### Part (c) #### ################## ## # table( tenhave$rotate, tenhave$test8, tenhave$test1 ) table( tenhave$test1, tenhave$test8, tenhave$rotate ) # ## #### Perform successive logistic regressions ## # tenhave$out1c <- ifelse( (tenhave$test8 < 2), 1, 0 ) tenhave$out2c <- ifelse( (tenhave$test8 < 3), 1, 0 ) tenhave$out3c <- ifelse( (tenhave$test8 < 4), 1, 0 ) # model.out1c <- glm( out1c ~ factor(test1) + rotate, data = tenhave, family = binomial ) model.out2c <- glm( out2c ~ factor(test1) + rotate, data = tenhave, family = binomial ) model.out3c <- glm( out3c ~ factor(test1) + rotate, data = tenhave, family = binomial ) # coef.output.c <- c( summary(model.out1c)$coef[1,1:2], summary(model.out1c)$coef[5,1:2] ) coef.output.c <- rbind( coef.output.c, c( summary(model.out2c)$coef[1,1:2], summary(model.out2c)$coef[5,1:2] ) ) coef.output.c <- rbind( coef.output.c, c( summary(model.out3c)$coef[1,1:2], summary(model.out3c)$coef[5,1:2] ) ) coef.output.c <- as.data.frame( round( coef.output.c, 3 ) ) names( coef.output.c ) <- c( "Intercept", "se", "rotate", "se" ) coef.output.c # ## #### Proportional Odds Model for test8; with test1 as a (factor) covariate ## # model.POMc <- polr( factor(test8) ~ rotate + factor(test1), data = tenhave ) summary( model.POMc ) # ## model.POMc <- lrm( factor(test8) ~ factor(test1) + rotate, data = tenhave ) ## model.POMc # Question 2 2a) To find B such that B(Y-π) = (Y-µ· H), solve algebraically (writing H in terms of the Yi) to find Bij = i = j : 1 i > j : πi1−γ(i−1) i < j : 0 a lower triangular matrix. For example, for 4 cutpoints, B = 1 0 0 0 π2 1−γ1 1 0 0 π3 1−γ2 π3 1−γ2 1 0 π4 1−γ3 π4 1−γ3 π4 1−γ3 1 where πi is P(Y=i) and γi is Σ i k=1πi. 2b) B is constructed so that B(Y-π) = Y-µ · H . So Cov(B(Y − π)) = Cov(Y − π · H) = E[(Y − µ · H)(Y − µ · H)T ] From page 67 of the notes, we know that the off-diagonal elements are zero. As for the diagonal elements, variances of B(Y − π) are the same (for the first element) or less (for other elements) than the diagonal Biostat/Stat 571 Exercise #2 7 elements of Σ. This is because Cov(B(Y − π)) = BΣBT : the negative off-diagonals of Σ (a multinomial covariance) are multiplied by the positive off-diagonals of B, and subtracted from Var(Y), since the diagonal elements of B are 1. 2c) Using code adapted from the WESDR example (page 67 of the class notes): > tenhave[1:3,] rotate t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 1 1 4 4 4 4 4 4 4 2 4 4 2 1 2 4 4 4 2 1 2 4 3 4 3 1 4 4 4 2 4 4 3 2 4 4 ## construct continuation model data for trials 1 and 8 ## (1) construct pairs (Y,H) nsubj_ nrow(tenhave) ncuts_ 3 y1.crm _NULL y8.crm _NULL h1.crm _NULL h8.crm _NULL id _NULL for (j in (1:nsubj)) { yj1 _rep(0,ncuts) yj8 _rep(0,ncuts) if (tenhave$t1[j] <= ncuts ) yj1[tenhave$t1[j]] _ 1 if (tenhave$t8[j] <= ncuts ) yj8[tenhave$t8[j]] _ 1 hj1 _ 1-c(0,cumsum(yj1)[1:(ncuts-1)]) hj8 _ 1-c(0,cumsum(yj8)[1:(ncuts-1)]) y1.crm _ c(y1.crm, yj1) y8.crm _ c(y8.crm, yj8) h1.crm _ c(h1.crm, hj1) h8.crm _ c(h8.crm, hj8) id _ c(id, rep(j, ncuts)) } #### (2) construct intercepts level _ factor(rep(1:ncuts,nsubj)) int.mat _ NULL for (j in 1:ncuts) { intj _ rep(0,ncuts) intj[j] _ 1 int.mat _ cbind(int.mat, rep(intj, nsubj)) } #### (3) expand the covariate rotate.crm _ rep(tenhave$rotate, rep(ncuts, nsubj)) #### (4) drop H=0 and build dataframe keep1 _ h1.crm==1 keep8 _ h8.crm==1 t1.crm.data _ data.frame( id=id[keep1], y=y1.crm[keep1], level=level[keep1], rotate=rotate.crm[keep1]) t8.crm.data _ data.frame( id=id[keep8], y=y8.crm[keep8], level=level[keep8], rotate=rotate.crm[keep8]) ### model fitting for 2c fit1 _ glm(y~level+rotate, family=binomial, data=t1.crm.data) summary(fit1) Coefficients: Value Std. Error t value (Intercept) -0.5442677 0.2831311 -1.922317 level2 0.6443691 0.4058320 1.587773 level3 0.6612261 0.4712200 1.403222 rotate -2.0075675 0.3800151 -5.282863 fit2 _ glm(y~level*rotate, family=binomial, data=t1.crm.data) summary(fit2) Coefficients: Value Std. Error t value (Intercept) -0.4418328 0.3021090 -1.4624946 level2 0.4418328 0.4838667 0.9131291 level3 0.4418328 0.6139903 0.7196086 rotate -2.5763281 0.7634178 -3.3747289 level2rotate 0.8127395 0.9596033 0.8469537 level3rotate 0.7845687 1.0496489 0.7474582 anova(fit1,fit2) Analysis of Deviance Table Response: y Terms Resid. Df Resid. Dev Test Df Deviance 1 level + rotate 203 199.6852 2 level * rotate 201 198.8258 +level:rotate 2 0.8593682 An analysis of deviance indicates that a common treatment odds ratio is appropriate (1-pchisq(.859,2)= 0.65). ### model fitting for 2d fit1d _ glm(y~level+rotate, family=binomial, data=t8.crm.data) summary(fit1d) Coefficients: Value Std. Error t value (Intercept) 0.4130039 0.2688397 1.5362462 level2 -0.1236940 0.3780036 -0.3272297 level3 -0.4010700 0.4817271 -0.8325667 rotate -0.8071621 0.3301882 -2.4445520