Statistical Analysis of Matched Pairs Data: McNemar's Test & Logistic Regression, Exams of Statistics

The analysis of matched pairs data in statistics, focusing on two main methods: mcnemar's test for comparing dependent proportions and conditional logistic regression for binary matched pairs. The principles, formulas, and examples of these methods, as well as their relationship and applications in longitudinal studies and categorical responses.

Typology: Exams

Pre 2010

Uploaded on 07/30/2009

koofers-user-mlx
koofers-user-mlx 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 10: Models for Matched Pairs
Lecture 1
Matched Pairs: Consider categorical responses for two samples where each
observation in the first sample pairs with an observation in the second sam-
ple. Such matched pairs data occur in longitudinal studies with repeated
measures on subjects. Due to matching, the responses in the two samples are
dependent.
A two-way square contingency table with same row and column categories
summarizes the data. For example 1825 subjects were asked
1) Does a person has the right to take his/her life if he/she has incurable
disease?
2) If a patient has an incurable disease, should a doctor be allowed to end
the patient’s life in some painless way?
This is a matched pairs data with two categorical responses from each subject
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Statistical Analysis of Matched Pairs Data: McNemar's Test & Logistic Regression and more Exams Statistics in PDF only on Docsity!

Chapter 10: Models for Matched Pairs Lecture 1

Matched Pairs: Consider categorical responses for two samples where each observation in the first sample pairs with an observation in the second sam- ple. Such matched pairs data occur in longitudinal studies with repeated measures on subjects. Due to matching, the responses in the two samples are dependent. A two-way square contingency table with same row and column categories summarizes the data. For example 1825 subjects were asked

  1. Does a person has the right to take his/her life if he/she has incurable disease?
  2. If a patient has an incurable disease, should a doctor be allowed to end the patient’s life in some painless way? This is a matched pairs data with two categorical responses from each subject

Let Patient Die Suicide Yes No Total Yes 1097 90 1187 No 203 435 638 Total 1300 525 1825 Comparing Dependent Proportions: For each pair, let πab = P (outcome a for the first observation, and outcome b for the second observation), and nab be the total number of such pairs in the sample. Let pa+ be the proportion of category a for observation 1, and p+a the cor- responding proportion for observation 2. Consider binary outcome. These sample proportions are correlated due to matched pairs. Test of Marginal Homogeneity: For marginal homogeneity, π1+ = π+1 which also implies π2+ = π+2. But, π1+ − π+1 = π 12 − π 21 , Thus marginal homogeneity implies π 12 = π 21 , The table shows symmetry across the main diagonal. Inference for Dependent Proportions: McNemar’s Test Take δ = π+1 − π1+, and d = p+1 − p1+ = p2+ − p+2. Recall that cov(p+1, p1+) = π^11 π^22 − n π^12 π^21 .(Try it!)

Small sample Test for Comparing Matched Proportions: Note that H 0 : π 12 = π 21 is equivalent to H 0 : (^) π 12 π+^12 π 21 = .5. Let n∗^ = n 12 + n 21. For small samples an exact test conditions on n∗. Under H 0 , conditioned on n∗, n 21 ∼ bin(n∗, .5). Also, under H 0 , E(n 21 ) =. 5 n∗^ and V ar(n 21 ) =. 25 n∗. P-value can be obtained from the binomial table. When n∗^ > 10, we can use the normal approximation with

z = n^21 √^ −. 25 nn∗∗/ 2 = (^) (nn 2121 +^ − n^ n 1212 ) 1 / 2

which is identical to the McNemar’s test. Relationship Between McNemar’s and CMH Tests Recall that the 2 × 2 table in the example given above is cross-classified in terms of the two responses “Let the patient die ” or “Suicide” as columns and rows. We can also rearrange the table as a series of 1825 2 × 2 partial tables as follows:

Response Subject Questions Yes No 1 Let Patient Die 1 0 Suicide 1 0 2 Let Patient Die 0 1 Suicide 1 0 3 Let Patient Die 1 0 Suicide 0 1 4 Let Patient Die 0 1 Suicide 0 1 ... ... ... ...

Of these, 1097 tables look like the first, 90 look like the second, 203 look like the third and 435 look like the fourth. Collapsing these table across the subjects produces a 2 × 2 table which contain marginal totals of the original table.

Yes No Question 1 1300 525 Question 2 1187 638 If in the partial tables, the probability of “Yes” is the same for the two ques-

Terminology: The 2 × 2 × n representation is called subject-specific table, and the repre- sentation in the form the first table is referred to as the population-averaged table.

Conditional Logistic Regression for Binary Matched Pairs: Marginal versus Conditional Models:A Model Based Analysis

  1. Marginal Models: Let (Y 1 , Y 2 ) denote the observation pair for a randomly selected subject, with “1” denoting “Success: Category 1” and “0” representing “Failure, Category 2” outcome. Let δ = P (Y 1 = 1) − P (Y 2 = 1) be the difference of marginal probabilities. Then δ occurs as a parameter in the following models:

P (Yt = 1) = α + βxt (1)

with x 1 = 0 and x 2 = 1. Then, P (Y 1 = 1) = α and P (Y 2 = 1) = α + δ. With the logit link,

logit[P (Yt = 1)] = α + βxt (2)

and β is the log odds ratio for the marginal distributions. In fact, βˆ =

log[p p+11+pp2++2 ]

  1. Conditional Models: The subject specific model allows the probabilities to vary across subjects. Let (Yi 1 , Yi 2 ), i = 1, 2 , · · · , n denote the observations on the ith^ subject. Then the conditional model is

link[P (Yit = 1)] = αi + βxt. (3)

In this model, β is defined conditional on the subject, and βˆ describes the conditional association for the tables stratified by subject. The effect is sub- ject specific as opposed to the effects in eqns (1) and (2) which are population averaged. It can be seen that for the identity link, the two types of effects are identical. However, this is not so for other links. For example, using the logit link,

P (Yi 1 = 1) = (^) 1 +exp exp(α(iα^ +i +βx βxt)t)

which when averaged over the entire population does not yield

P (Yt = 1) = (^) 1 +exp exp(α(^ +α +βx βxt)t)

obtained from the marginal logit model (2). For the conditional logit model we have logit[P (Yit = 1)] = αi + βxt

Let (yi 1 , yi 2 ), i = 1, 2 , · · · , n be the n pairs of observations. Their joint pf is proportional to exp[∑ i αi(yi 1 + yi 2 ) + β(∑ i yi 2 )]

We eliminate αi by conditioning on their sufficient statistics {Si = Yi 1 + Yi 2 }. The conditional pf of (Yi 1 , Yi 2 ) depends on β only when Si = 1, i.e. outcomes differ for the two responses. Thus

P (Yi 1 = yi 1 , Yi 2 = yi 2 |Si = 1) = (^) 1 +exp exp(β()β), yi 1 = 0, yi 2 = 1 = (^) 1 + exp^1 (β), yi 1 = 1, yi 2 = 0

Conditional on Si = 1, the joint distribution of of (Yi 1 , Yi 2 ) is ∏ Si=1^ (^

1 + exp(β)) yi (^1) ( exp(β) 1 + exp(β)) yi (^2) = [exp(β)]n^12 [1 + exp(β)]n∗ where n∗^ = n 12 + n 21 = ∑ i Si. It can be seen that the conditional ML estimator of β is βˆ = log(n 21 /n 12 )

with standard error SE = √ 1 /n 21 + 1/n 12. Another way to handle the large number of parameters {αi} is to assume them as random effects from a N (μ, σ^2 ). The lik. function then depends on the three parameters β, μ, σ^2. For matched pairs, we again have βˆ = log(n 21 /n 12 ). This is an example of GLM mixed models.

Thus, the three estimates (Mantel-Haenszel, the conditional ML, and the ML estimate with random effects) all lead to the same estimator for β.