Handling Dependent Observations in Cluster Randomized Studies: Correlated Data - Prof. Mic, Study notes of Data Analysis & Statistical Methods

The issue of correlated data in statistics, specifically in the context of cluster randomized studies. It covers the central issue of testing, estimation, and sample size calculations when responses within a cluster may not be independent. The document also introduces the concept of the variance inflation factor (vif) and its role in measuring the increase in variance due to within-subject correlation.

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-f2u-2
koofers-user-f2u-2 🇺🇸

9 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Correlated Data
Bios 662
Michael G. Hudgens, Ph.D.
http://www.bios.unc.edu/mhudgens
2007-11-14 12:03
BIOS 662 1 Correlated Data
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Handling Dependent Observations in Cluster Randomized Studies: Correlated Data - Prof. Mic and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Correlated Data Bios 662

Michael G. Hudgens, Ph.D. [email protected]

http://www.bios.unc.edu/∼mhudgens

2007-11-14 12:

Clustered Data

  • Heretofore, all methods assume data iid
  • What to do if dependencies exist between data?
  • Typically correlated data occur in clusters/groups
  • Examples:
    • Repeated measures on individuals over time
    • Natural groupings of individuals (e.g., litters, schools)
  • Can occur in observational or randomized studies; ex- ample of later would be cluster randomized studies

Central Issue

  • How do we do testing, estimation, sample size calcula- tions, etc allowing that responses within a cluster/group (e.g., school) may not be independent?
  • Use methods allowing for dependency (correlation) within groups but assuming independence (no correlation) be- tween groups

Continuous response model

  • Let Yijk = the response of the kth^ person in the jth cluster in the ith^ treatment level

i = 1, 2 ,... , t j = 1, 2 ,... , c k = 1, 2 ,... , m

  • Let

Y¯ij =

∑m k=1 Yijk m

Continuous response model

  • Then

V ar( Y¯ij) = E( Y¯ (^) ij^2 ) − μ^2 i

= m−^2 E

{∑m k=1 Yijk

− μ^2 i

= m−^2 E

m k=1 Y^ 2 ijk +^

k 6 =k′YijkYijk′

− μ^2 i

= m−^2

mσ^2 + m(m − 1)ρσ^2

= σ

2 m {1 + (m^ −^ 1)ρ}

Variance Inflation Factor (VIF)

  • {1 + (m − 1)ρ} is the variance inflation factor (VIF)
  • Measures the increase in the variance of the mean due to the within-subject correlation of measurements (ρ)
  • VIF > 1 for ρ > 0 and m > 1

Continuous response model

  • Suppose t = 2 and n 1 = n 2 = cm
  • If we ignore the correlation within cluster

zignore =

Y¯ 1 − Y¯ 2

σ

1 /n 1 + 1/n 2

  • Should instead use

ztrue = Y¯ 1 − Y¯ 2 σ

(1/n 1 +1/n 2 ){V IF }

= √zignoreV IF

Effect of correlation

  • ztrue < zignore for ρ > 0 and m > 1
  • Thus ignoring correlation will lead to inflated type I error
  • Intuition: naive approach acts as if we have more infor- mation than we do

Sample Size when t = 2

  • If ρ = 0, then V IF = 1 and

n = 2

(z 1 −α/ 2 +^ z^1 −β ∆

  • If ρ = 1, then V IF = m

n = 2

(z 1 −α/ 2 +^ z^1 −β ∆

m

  • Typically 0. 1 ≤ ρ ≤ 0. 4

Variance Inflation Factors

  • Table 18.4 text:

ρ m 0.001 0.01 0.02 0.05 0. 2 1.001 1.01 1.02 1.05 1. 5 1.004 1.04 1.09 1.20 1. 10 1.009 1.09 1.18 1.45 1. 100 1.099 1.99 2.98 5.95 10. 1000 1.999 10.99 20.98 50.95 100.