Notes on Logistic Regression: Deviance Residuals and Model Building, Study notes of Statistics

These notes provide additional information on logistic regression, focusing on deviance residuals and model building using r. Code for calculating deviance residuals, comparing them to the residuals from the stepwise model selection process, and discussing the use of the aic in model selection.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-6yl
koofers-user-6yl 🇺🇸

9 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
These are additional notes for Chapter 5.
Deviance residuals
These residuals (and their corresponding “standardized”)
values can be used just like Pearson residuals and
standardized residuals. They measure the contribution
of an observation to the LRT goodness-of-fit test
statistic. “Large” values will contribute a lot to the
statistic, so they indicate an EVP is being fit poorly by
the model. Below is some R code and output that show
their calculation. The code is in
placekick_ch5_pattern.R.
> mod.fit<-glm(formula = y/n ~ distance, data =
place.pattern, weight=n, family = binomial(link =
logit), na.action = na.exclude, control = list(epsilon =
0.0001, maxit = 50, trace = T))
Deviance = 47.12782 Iterations - 1
Deviance = 44.52325 Iterations - 2
Deviance = 44.49945 Iterations - 3
Deviance = 44.49945 Iterations - 4
> summary(mod.fit)
Call:
glm(formula = y/n ~ distance, family = binomial(link =
logit), data = place.pattern, weights = n, na.action =
na.exclude, control = list(epsilon = 1e-04, maxit = 50,
trace = T))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0373 -0.6449 -0.1424 0.5004 2.2758
Coefficients:
2009 Christopher R. Bilder
Add.5.1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Notes on Logistic Regression: Deviance Residuals and Model Building and more Study notes Statistics in PDF only on Docsity!

These are additional notes for Chapter 5.

Deviance residuals

These residuals (and their corresponding “standardized”)

values can be used just like Pearson residuals and

standardized residuals. They measure the contribution

of an observation to the LRT goodness-of-fit test

statistic. “Large” values will contribute a lot to the

statistic, so they indicate an EVP is being fit poorly by

the model. Below is some R code and output that show

their calculation. The code is in

placekick_ch5_pattern.R.

mod.fit<-glm(formula = y/n ~ distance, data = place.pattern, weight=n, family = binomial(link = logit), na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = T)) Deviance = 47.12782 Iterations - 1 Deviance = 44.52325 Iterations - 2 Deviance = 44.49945 Iterations - 3 Deviance = 44.49945 Iterations - 4 summary(mod.fit) Call: glm(formula = y/n ~ distance, family = binomial(link = logit), data = place.pattern, weights = n, na.action = na.exclude, control = list(epsilon = 1e-04, maxit = 50, trace = T)) Deviance Residuals: Min 1Q Median 3Q Max -2.0373 -0.6449 -0.1424 0.5004 2. Coefficients:

Estimate Std. Error z value Pr(>|z|) (Intercept) 5.812080 0.326245 17.82 <2e-16 *** distance -0.115027 0.008338 -13.79 <2e-16 ***


Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 282.181 on 42 degrees of freedom Residual deviance: 44.499 on 41 degrees of freedom AIC: 148. Number of Fisher Scoring iterations: 4

dev.resid<-resid(mod.fit, type="deviance") h<-lm.influence(mod.fit)$h dev.stand.resid<-dev.resid/sqrt(1-h) y<-place.pattern$y pi.hat<-mod.fit$fitted.values n<-place.pattern$n pi.tilde<-y/n #observed proportion #LRT for GOF -2(sum(ylog(pi.hat) + (n-y)log(1-pi.hat)) – sum(log(pi.tilde^y) + log((1-pi.tilde)^(n-y)))) #Need to do second part with pi^y due to 0 pi values [1] 44. mod.fit$deviance [1] 44. dev.resid.mine<-sign(y - npi.hat) * sqrt(-2((ylog(pi.hat) + (n-y)*log(1-pi.hat)) – (log(pi.tilde^y) + log((1-pi.tilde)^(n-y))))) #Need to do second part with pi^y due to 0 pi values summary(dev.resid.mine)

Min. 1st Qu. Median Mean 3rd Qu. Max. -2.0370 -0.6449 -0.1424 -0.0906 0.5004 2.

summary(dev.resid) Min. 1st Qu. Median Mean 3rd Qu. Max. -2.0370 -0.6449 -0.1424 -0.0906 0.5004 2.

Section 5.1.5 and model building

R has an automated variable selection function called

step() which uses the “AIC” to perform it. The smaller

the AIC, the better the model. Notice the model that I

start with here. The model includes all of the variables

identified from step 1 of the model building process.

Also, notice that I turned off the trace option. This was

done so that the step() function did not trace every single

model being fit in the backward elimination.

#Notice that I did not trace the iterative proceds. If

mod.fit<-glm(formula = good1~change+dist+type1+pat1+elap30+week+wind, data = placekick.mb, family = binomial(link = logit), na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) res<-step(mod.fit, direction="backward", scope=list(upper=mod.fit$formula)) Start: AIC= 784. good1 ~ change + dist + type1 + pat1 + elap30 + week + wind Df Deviance AIC

  • elap30 1 768.71 782.
  • type1 1 768.91 782.
  • week 1 770.23 784. 768.57 784.
  • wind 1 770.90 784.
  • change 1 771.76 785.
  • pat1 1 777.22 791.
  • dist 1 840.96 854. Step: AIC= 782. good1 ~ change + dist + type1 + pat1 + week + wind Df Deviance AIC
  • type1 1 769.08 781.
  • week 1 770.36 782. 768.71 782.
  • wind 1 771.03 783.
  • change 1 771.81 783.
  • pat1 1 777.35 789.
  • dist 1 842.83 854. Step: AIC= 781. good1 ~ change + dist + pat1 + week + wind Df Deviance AIC
  • week 1 770.76 780. 769.08 781.
  • wind 1 771.12 781.
  • change 1 772.11 782.
  • pat1 1 777.63 787.
  • dist 1 844.63 854. Step: AIC= 780. good1 ~ change + dist + pat1 + wind Df Deviance AIC 770.76 780.
  • wind 1 773.26 781.
  • change 1 773.59 781.
  • pat1 1 779.40 787.
  • dist 1 846.88 854.

res$anova Stepwise Model Path Analysis of Deviance Table Initial Model: good1 ~ change + dist + type1 + pat1 + elap30 + week + wind Final Model: good1 ~ change + dist + pat1 + wind Step Df Deviance Resid. Df Resid. Dev AIC 1 NA NA 1430 768.5680 784. 2 - elap30 1 0.1451136 1431 768.7131 782. 3 - type1 1 0.3668809 1432 769.0800 781.

Sections 5.1.6-5.1.

Logistic regression models are often used to classify

observations into one of two response categories. For

example, if ˆi^ >0.5, then observation i gets classified into

category 1. A related topic is the use of receiver

operating characteristic (ROC) curves for deciding where

the optimal cutoff probability (0.5 or some other value)

should be for the classification process. My STAT 873

(applied multivariate statistics) course discusses these

topics in the Chapter 8 lecture notes. You can obtain

these lecture notes from the course website at

www.chrisbilder.com/stat873/schedule.htm.

Additional measures of model fit: R

2

and the Hosmer and

Lemeshow test

R

2

There is a measure called “R

2

” that is sometimes used to

measure how well the model fits the data. The motivation

for this measure comes from the R

2

used in regular

regression. Instead of examining sums of squares, the R

2

in logistic regression uses values of the likelihood function

evaluated at the parameter estimates.

Without going into too much detail here, I recommend

against using it. See Cox and Wermuth (1992) or Agresti

(2002, p.226-8) for more information.

Hosmer and Lemeshow test

This a common test used to measure the goodness of fit

for a model. See p.147 of Agresti (2007) for a brief

description. Hosmer and Lemeshow (2000) give a more

thorough description on p. 140-1. The basic idea is to

group observations with a similar ˆj^. Find the estimated

number of success for the group and compare it to the

observed number of successes in a format similar to a

Pearson statistic.

SAS PROC LOGISTIC implements it automatically when

the LACKFIT option is specified on the MODEL statement.

Section 5.2 – other R tools

R has a function called glm.diag() that can calculate a

few of the diagnostic measures we have discussed. The

function glm.diag.plots() creates 4 plots of these

measures. These two functions are contained in the

boot package. Below is an example of their

implementation. Notice the interactive feature of

glm.diag.plots().

mod.fit<-glm(formula = good1/total ~ change+dist+pat1+wind+dist*wind, data = place.pattern.non20, weight=total, family = binomial(link = logit), na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50,

trace = T)) Deviance = 116.2259 Iterations - 1 Deviance = 113.8865 Iterations - 2 Deviance = 113.8581 Iterations - 3 Deviance = 113.8580 Iterations - 4

save.diag<-glm.diag(mod.fit) names(save.diag) [1] "res" "rd" "rp" "cook" "h" "sd" glm.diag.plots(mod.fit, iden=TRUE)


Please Input a screen number (1,2,3 or 4) 0 will terminate the function 3 Interactive Identification for screen 3 left button = Identify, center button = Exit


Please Input a screen number (1,2,3 or 4) 0 will terminate the function 4 Interactive Identification for screen 4 left button = Identify, center button = Exit

Please Input a screen number (1,2,3 or 4) 0 will terminate the function 0 -2 0 2 4

0 1 2 Linear predictor Residuals -2 -1 0 1 2

0 1 2 Ordered deviance residuals Quantiles of standard normal 0 1 2 3 4

  1. h/(1-h) Cook statistic 0 20 40 60 80 100 120

Case Cook statistic 20/1/0/0 20/1/0/ 25/0/1/1 (^) 32/0/0/ 50/0/0/

The function, influence.measures() , can also calculate

some influence measures. See the example in the

placekick_model_build.R program.

Exact inference about conditional associations

The probability distribution for the Cochran-Mantel-

Haenszel (CMH) statistic is NOT a 

2

distribution! We

use a 

2

distribution when n++k is large because the

statistic “approximately” will have this distribution. What

happens if the sample is not large???

Often, the test statistic’s probability distribution can

still be approximated rather well using a 

2

distribution. Suppose many of the n++k are small

when K is large. There will be many partial tables

allowing n to be large despite the small sample size

within a partial table.

There are still times when 

2

distribution approximation

will still not perform adequately. Some alternative

methods are:

 One could perform a test similar to Fisher’s exact test.

With Fisher’s exact test in Chapter 2, there was only

one 22 table and thus one hypergeometric

distribution. Now since there are K 22 tables, there

are K different hypergeometric distributions.

Probabilities can be found for observing a kn11k using

these K hypergeometric distributions. See p. 254 of

Agresti (2002) for more details. The mantelhaen.test ()

function implements this with the exact = TRUE option.

 Permutation tests can be used to estimate the

distribution of the test statistic. How?

Example: Table 3.4 on p. 65 of Agresti (1996) and Table 5.

on p. 159 of Agresti (2007) (tab3.4.R)

The question of interest: Given month, is race

independent of promotion?

The reason for considering these alternative approaches

is the 0’s in the 2 2 3 table. Note that the sample size

itself is not small.

Below is the R code and output.

n.table<-array(data = c(0, 4, 7, 16, 0, 4, 7, 13, 0, 2, 8, 13), dim = c(2,2,3), dimnames = list(Race = c("Black", "White"), Promotion = c("Yes", "No"), Month = c("July", "August", "September"))) n.table , , Month = July Promotion Race Yes No Black 0 7 White 4 16 , , Month = August Promotion Race Yes No Black 0 7 White 4 13 , , Month = September Promotion Race Yes No Black 0 8 White 2 13

The permutation approach can be done in a similar

manner as done in Chapter 2 when testing for

independence. The only differences are that the CMH

test will be performed for each permutation sample and

the sampling needs to be done within strata (why?). The

strata option in boot() allows one to specify the strata.

Below is the R code and output.

n.table<-array(data = c(0, 4, 7, 16, 0, 4, 7, 13, 0, 2, 8, 13), dim = c(2,2,3), dimnames = list(Race = c("Black", "White"), Promotion = c("Yes", "No"), Month = c("July", "August", "September"))) n.table , , Month = July Promotion Race Yes No Black 0 7

White 4 16 , , Month = August Promotion Race Yes No Black 0 7 White 4 13 , , Month = September Promotion Race Yes No Black 0 8 White 2 13

#Raw data all.data <- matrix(NA, 0, 3) #Put data in "raw" form for(i in 1:2) {

for(j in 1:2) { for(k in 1:3) { all.data <- rbind(all.data, matrix(c(i, j, k), n.table[i, j, k], 3, byrow = T)) } } }

#Need in a data.frame format for the ftable() function call coming up all.data2<-data.frame(all.data) #One way to assign column names (other ways can be done) names(all.data2)<-c("Race", "Promotion", "Month") all.data2[1:2,] Race Promotion Month 1 1 2 1 2 1 2 1 #Check data ftable(Month+Promotion~Race, data = all.data2) Month 1 2 3 Promotion 1 2 1 2 1 2 Race 1 0 7 0 7 0 8 2 4 16 4 13 2 13 library(boot) m.sq<-function(data,i) { perm.data<-data[i] mantelhaen.test(all.data[,1], perm.data, all.data[,3], correct=F)$statistic } summarize.boot<-function(results, df, B) { par(mfrow = c(1,2)) #Histogram hist(results$t, main = "Histogram of perm. dist.", col = "blue", xlab="statistic^*")

Histogram of perm. dist. statistic^* Frequency 0 5 10 15 0 200 400 600 0 2 4 6 8 10 12 14 0 2 4 6 8 10 QQ-Plot of perm. dist. statistic^* chi.quant

Notes:

 Using a

approximation to the distribution of CMH

results in a p-value of 0.0321. The permutation

approach with B=1,000 results in a p-value of 0.04.

Thus, there is not much difference between the two.

 Notice the discreteness of the test statistic values for

each permutation (see QQ-plot). This is happening

because of the small n+1k values within each strata.

There are a very limited number of possible values the

test statistic can take on.