











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These notes provide additional information on logistic regression, focusing on deviance residuals and model building using r. Code for calculating deviance residuals, comparing them to the residuals from the stepwise model selection process, and discussing the use of the aic in model selection.
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!












mod.fit<-glm(formula = y/n ~ distance, data = place.pattern, weight=n, family = binomial(link = logit), na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = T)) Deviance = 47.12782 Iterations - 1 Deviance = 44.52325 Iterations - 2 Deviance = 44.49945 Iterations - 3 Deviance = 44.49945 Iterations - 4 summary(mod.fit) Call: glm(formula = y/n ~ distance, family = binomial(link = logit), data = place.pattern, weights = n, na.action = na.exclude, control = list(epsilon = 1e-04, maxit = 50, trace = T)) Deviance Residuals: Min 1Q Median 3Q Max -2.0373 -0.6449 -0.1424 0.5004 2. Coefficients:
Estimate Std. Error z value Pr(>|z|) (Intercept) 5.812080 0.326245 17.82 <2e-16 *** distance -0.115027 0.008338 -13.79 <2e-16 ***
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 282.181 on 42 degrees of freedom Residual deviance: 44.499 on 41 degrees of freedom AIC: 148. Number of Fisher Scoring iterations: 4
dev.resid<-resid(mod.fit, type="deviance") h<-lm.influence(mod.fit)$h dev.stand.resid<-dev.resid/sqrt(1-h) y<-place.pattern$y pi.hat<-mod.fit$fitted.values n<-place.pattern$n pi.tilde<-y/n #observed proportion #LRT for GOF -2(sum(ylog(pi.hat) + (n-y)log(1-pi.hat)) – sum(log(pi.tilde^y) + log((1-pi.tilde)^(n-y)))) #Need to do second part with pi^y due to 0 pi values [1] 44. mod.fit$deviance [1] 44. dev.resid.mine<-sign(y - npi.hat) * sqrt(-2((ylog(pi.hat) + (n-y)*log(1-pi.hat)) – (log(pi.tilde^y) + log((1-pi.tilde)^(n-y))))) #Need to do second part with pi^y due to 0 pi values summary(dev.resid.mine)
Min. 1st Qu. Median Mean 3rd Qu. Max. -2.0370 -0.6449 -0.1424 -0.0906 0.5004 2.
summary(dev.resid) Min. 1st Qu. Median Mean 3rd Qu. Max. -2.0370 -0.6449 -0.1424 -0.0906 0.5004 2.
#Notice that I did not trace the iterative proceds. If
mod.fit<-glm(formula = good1~change+dist+type1+pat1+elap30+week+wind, data = placekick.mb, family = binomial(link = logit), na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) res<-step(mod.fit, direction="backward", scope=list(upper=mod.fit$formula)) Start: AIC= 784. good1 ~ change + dist + type1 + pat1 + elap30 + week + wind Df Deviance AIC
res$anova Stepwise Model Path Analysis of Deviance Table Initial Model: good1 ~ change + dist + type1 + pat1 + elap30 + week + wind Final Model: good1 ~ change + dist + pat1 + wind Step Df Deviance Resid. Df Resid. Dev AIC 1 NA NA 1430 768.5680 784. 2 - elap30 1 0.1451136 1431 768.7131 782. 3 - type1 1 0.3668809 1432 769.0800 781.
2
2
2
2
2
mod.fit<-glm(formula = good1/total ~ change+dist+pat1+wind+dist*wind, data = place.pattern.non20, weight=total, family = binomial(link = logit), na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50,
trace = T)) Deviance = 116.2259 Iterations - 1 Deviance = 113.8865 Iterations - 2 Deviance = 113.8581 Iterations - 3 Deviance = 113.8580 Iterations - 4
save.diag<-glm.diag(mod.fit) names(save.diag) [1] "res" "rd" "rp" "cook" "h" "sd" glm.diag.plots(mod.fit, iden=TRUE)
Please Input a screen number (1,2,3 or 4) 0 will terminate the function 3 Interactive Identification for screen 3 left button = Identify, center button = Exit
Please Input a screen number (1,2,3 or 4) 0 will terminate the function 4 Interactive Identification for screen 4 left button = Identify, center button = Exit
Please Input a screen number (1,2,3 or 4) 0 will terminate the function 0 -2 0 2 4
0 1 2 Linear predictor Residuals -2 -1 0 1 2
0 1 2 Ordered deviance residuals Quantiles of standard normal 0 1 2 3 4
Case Cook statistic 20/1/0/0 20/1/0/ 25/0/1/1 (^) 32/0/0/ 50/0/0/
2
2
2
2
n.table<-array(data = c(0, 4, 7, 16, 0, 4, 7, 13, 0, 2, 8, 13), dim = c(2,2,3), dimnames = list(Race = c("Black", "White"), Promotion = c("Yes", "No"), Month = c("July", "August", "September"))) n.table , , Month = July Promotion Race Yes No Black 0 7 White 4 16 , , Month = August Promotion Race Yes No Black 0 7 White 4 13 , , Month = September Promotion Race Yes No Black 0 8 White 2 13
n.table<-array(data = c(0, 4, 7, 16, 0, 4, 7, 13, 0, 2, 8, 13), dim = c(2,2,3), dimnames = list(Race = c("Black", "White"), Promotion = c("Yes", "No"), Month = c("July", "August", "September"))) n.table , , Month = July Promotion Race Yes No Black 0 7
White 4 16 , , Month = August Promotion Race Yes No Black 0 7 White 4 13 , , Month = September Promotion Race Yes No Black 0 8 White 2 13
#Raw data all.data <- matrix(NA, 0, 3) #Put data in "raw" form for(i in 1:2) {
for(j in 1:2) { for(k in 1:3) { all.data <- rbind(all.data, matrix(c(i, j, k), n.table[i, j, k], 3, byrow = T)) } } }
#Need in a data.frame format for the ftable() function call coming up all.data2<-data.frame(all.data) #One way to assign column names (other ways can be done) names(all.data2)<-c("Race", "Promotion", "Month") all.data2[1:2,] Race Promotion Month 1 1 2 1 2 1 2 1 #Check data ftable(Month+Promotion~Race, data = all.data2) Month 1 2 3 Promotion 1 2 1 2 1 2 Race 1 0 7 0 7 0 8 2 4 16 4 13 2 13 library(boot) m.sq<-function(data,i) { perm.data<-data[i] mantelhaen.test(all.data[,1], perm.data, all.data[,3], correct=F)$statistic } summarize.boot<-function(results, df, B) { par(mfrow = c(1,2)) #Histogram hist(results$t, main = "Histogram of perm. dist.", col = "blue", xlab="statistic^*")
Histogram of perm. dist. statistic^* Frequency 0 5 10 15 0 200 400 600 0 2 4 6 8 10 12 14 0 2 4 6 8 10 QQ-Plot of perm. dist. statistic^* chi.quant