Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Case Control Study - Lecture Notes | BIOST 570, Study notes of Biostatistics

University of Washington (UW) - Seattle Biostatistics

Prof. Lumley Material Type: Notes; Class: ADV APPL LIN MODELS; Subject: Biostatistics; University: University of Washington - Seattle; Term: Autumn 2005;

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-czd-1 🇺🇸

8 documents

1 / 24

This page cannot be seen from the preview

Don't miss anything!

Case–control studies

A Thomas Lumley production

starring Ben French

BIOST 570

2005-10-24

Discover Study notes of Biostatistics University of Washington (UW) - Seattle

Partial preview of the text

Download Case Control Study - Lecture Notes | BIOST 570 and more Study notes Biostatistics in PDF only on Docsity!

Case–control studies

A Thomas Lumley production starring Ben French

BIOST 570

2005-10-

Rare events

Logistic regression for a rare event is relatively inefficient. For a single binary predictor we have a 2 × 2 table

X=0 X= Y=0 a b m 0 Y=1 c d m 1 n 0 n 1

The estimated variance of β is

var[ˆβ] =

d If a and b are much larger than c and d the variance of ˆβ depends mostly on c and d.

Probability weights

We could also correct for biased sampling depending on Y using sampling weights.

Suppose in a population sample we get

X=0 X= Y=0 a b m 0 Y=1 c d m 1 n 0 n 1

The odds ratio is ad/bc

If we sample all the cases and a fraction π of the controls the expected value of the sample table looks like

X=0 X= Y=0 aπ bπ m 0 π Y=1 c d m 1

Probability weights

The probability-weighted sample odds ratio estimates

daπ/π cbπ/π

= ad bc

The unweighted sample odds ratio estimates

daπ cbπ

ad bc

The odds ratio is consistently estimated using any (correct or incorrect) value for π. If we can use an arbitrary value of π in estimation we should choose the value that gives the greatest precision, which is π = 1.

Case–control studies

The case–control study is fundamental to epidemiology, particu- larly cancer epidemiology. The main difficulty is ensuring that the controls and cases really are sampled from the same population.

For example, to estimate the risk from cellphone use when driving the cases are car crashes; the controls should be a random sample of non-crashes from people driving at the same time as the crash.

If cases are heart attacks treated at UWMC, controls should be a random sample of people who didn’t have a heart attack but would have been treated at UWMC if they had.

Stratified analysis

Suppose we want an analysis stratified by a confounder, and so we have K 2 × 2 tables and the Mantel–Haenszel estimator.

The MH estimating equations for the common odds ratio ψ are

∑^ K k=

akdk − bkckψ = 0

If the data came by case–control sampling with sampling fractions πk in stratum k we get

∑^ K k=

ak πk

dk − bk πk

ckψ = 0

Logistic regression

With multiple confounders a stratified analysis is not feasible and we need logistic regression. We also need logistic regression for a continuous exposure variable.

When X is high-dimensional or continuous and has unspecified distribution it is harder to work with P [X|Y ] than in 2 × 2 tables, but the probability-weighting approach is still straightforward.

The probability-weighted estimating functions for logistic regres- sion are ∑^ n i=

πi

(Y − μi) = 0

where πi = 1 for cases and πi = π for controls

We hope that these estimating equations are unbiased for any choice of π, which would allow us to the ordinary logistic regression equations with π = 1, which are the most efficient.

Logistic regression

Start with the population or cohort of size N from which the sample is taken. Write Ri = 1 if person i in the population is observed.

Suppose that in the population logitE[Yi|Xi = xi] = logitμ = α + xβ and that we fit a logistic regression model with linear predictor logit˜μ = ˜α + xβ. Write π for the true control sampling fraction, so that P [Ri = 1|Y = 0] = π, and ˜π for the assumed control sampling fraction.

The estimating equations for ˜α and β are

∑^ N i=

Ui =

∑^ N i=

xi Ri πi

(Yi − μ˜i) = 0

Logistic regression

The assumption that the logistic regression model is true in the population was critical. If E[Y ] is misspecified the case–control and cohort logistic regressions do not estimate the same β.

Suppose there is an interaction between exposure and age, and we do not model this interaction. The cohort logistic regression model estimates a weighted average of the effect of exposure at different ages, weighted according to the population distribution of age.

The case–control logistic regression also estimates a weighted average, but weighted according to the case–control distribution of age. If age is a risk factor for disease [and it always is], the case–control logistic regression gives more weight to the effect of exposure at higher ages.

Biostatisticians usually consider this a worthwhile tradeoff. Survey statisticians may disagree.

Likelihood

We have shown that a logistic regression is consistent with any sets of probability weights. The most efficient set of weights comes from ignoring the case–control sampling.

Prentice & Pyke (Biometrics, 1976) show that in fact logistic regression ignoring the case–control sampling is the maximum likelihood estimator. If X is not discrete it is a nonparametric MLE, which does not necessarily imply anything optimal about its properties.

Breslow, Robins & Wellner (2000) showed that logistic regres- sion is semiparametric efficient in case–control studies.

Example

summary(model1) Call: glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, family = binomial(), data = esoph)

Deviance Residuals: Min 1Q Median 3Q Max -1.6891 -0.5618 -0.2168 0.2314 2.

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -5.9108 1.0302 -5.737 9.61e-09 *** agegp35-44 1.6095 1.0676 1.508 0. agegp45-54 2.9752 1.0242 2.905 0.003675 ** agegp55-64 3.3584 1.0198 3.293 0.000991 *** agegp65-74 3.7270 1.0253 3.635 0.000278 ***

Example

agegp75+ 3.6818 1.0645 3.459 0.000543 *** tobgp10-19 0.3407 0.2054 1.659 0.. tobgp20-29 0.3962 0.2456 1.613 0. tobgp30+ 0.8677 0.2765 3.138 0.001701 ** alcgp40-79 1.1216 0.2384 4.704 2.55e-06 *** alcgp80-119 1.4471 0.2628 5.506 3.68e-08 *** alcgp120+ 2.1154 0.2876 7.356 1.90e-13 ***

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 227.241 on 87 degrees of freedom Residual deviance: 53.973 on 76 degrees of freedom AIC: 225.

Example

Analysis of Deviance Table

Model 1: cbind(ncases, ncontrols) ~ agegp + tobgp Model 2: cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 79 120. 2 76 53.973 3 66.054 2.984e-

Wald tests

library(survey) regTermTest(model1,~alcgp) Wald test for alcgp in glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, family = binomial(), data = esoph) Chisq = 57.89887 on 3 df: p= 1.652e-

Example

regTermTest(model1,~tobgp) Wald test for tobgp in glm(formula = cbind(ncases, ncontrols) ~ agegp + tobgp + alcgp, family = binomial(), data = esoph) Chisq = 10.76880 on 3 df: p= 0.

Case Control Study - Lecture Notes | BIOST 570, Study notes of Biostatistics

Related documents

Partial preview of the text

Download Case Control Study - Lecture Notes | BIOST 570 and more Study notes Biostatistics in PDF only on Docsity!

Case–control studies

BIOST 570

Rare events

Probability weights

Probability weights

Case–control studies

Stratified analysis

Logistic regression

Logistic regression

Logistic regression

Likelihood

Example

Example

Example

Example