Research Design Principles - Lecture Notes | STAT 502, Study notes of Statistics

Material Type: Notes; Professor: Hoff; Class: DESIGN ANLYS EXPMTS; Subject: Statistics; University: University of Washington - Seattle; Term: Autumn 2005;

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-0y8
koofers-user-0y8 🇺🇸

9 documents

1 / 160

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 502 Lecture Notes
Peter D. Hoff
c
December 6, 2006
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Research Design Principles - Lecture Notes | STAT 502 and more Study notes Statistics in PDF only on Docsity!

Statistics 502 Lecture Notes

Peter D. Hoff

©^ cDecember 6, 2006

Contents

2.10.2 Which two-sample t-test to use? tdiff(yA, yB) vs. t(yA, yB)

  • 1 Research Design Principles
    • 1.1 Induction
    • 1.2 Model of a process or system
    • 1.3 Experiments and observational studies
    • 1.4 Steps in designing an experiment
  • 2 Comparing Two Treatments
    • 2.1 Summaries of sample populations
    • 2.2 Hypothesis testing via randomization
    • 2.3 Essential nature of a hypothesis test
    • 2.4 Basic decision theory, or “What use is a p-value?”
    • 2.5 Relating samples to (super)populations
    • 2.6 Normal distribution
    • 2.7 Introduction to the t-test
    • 2.8 Two sample tests
    • 2.9 Power and Sample Size Determination
      • 2.9.1 The non-central t-distribution
      • 2.9.2 Computing the Power of a test
    • 2.10 Checking Assumptions of a two-sample t-test
      • 2.10.1 Two-sample t-test with unequal variances
  • 3 Comparing Several Treatments
    • 3.1 Introduction to ANOVA
      • 3.1.1 A model for treatment variation
      • 3.1.2 Model Fitting
      • 3.1.3 Testing hypothesis with M SE and M ST
      • 3.1.4 Partitioning sums of squares CONTENTS ii
      • 3.1.5 The ANOVA table
      • 3.1.6 More sums of squares geometry
      • 3.1.7 Unbalanced Designs
      • 3.1.8 Normal sampling theory for ANOVA
      • 3.1.9 Sampling distribution of the F -statistic
      • 3.1.10 Comparing group means
      • 3.1.11 Power calculations for the F-test
    • 3.2 Treatment Comparisons
      • 3.2.1 Contrasts
      • 3.2.2 Orthogonal Contrasts
      • 3.2.3 Multiple Comparisons
    • 3.3 Model Diagnostics
      • 3.3.1 Detecting violations with residuals
      • 3.3.2 Variance stabilizing transformations
  • 4 Multifactor Designs
    • 4.1 Factorial Designs
      • 4.1.1 Data analysis:
      • 4.1.2 Additive effects model
      • 4.1.3 Evaluating additivity:
      • 4.1.4 Inference for additive treatment effects
    • 4.2 Randomized complete block designs
    • 4.3 Unbalanced designs
      • 4.3.1 Non-orthogonal sums of squares:
    • 4.4 Analysis of covariance
  • 5 Nested Designs
    • 5.1 Nested Designs
      • 5.1.1 Mixed-effects approach
      • 5.1.2 Repeated measures analysis:
  • 2.1 Wheat yield distributions List of Figures
  • 2.2 Randomization distribution for the wheat example
  • 2.3 The superpopulation model
  • 2.4 The χ^2 distribution
  • 2.5 The t-distribution
  • 2.6 A t 8 null distribution and α = 0.05 rejection region
  • 2.7 The t-based null distribution for the wheat example
  • 2.8 Randomization distribution for the t-statistic
  • 2.9 The non-central t-distribution
  • 2.10 Critical regions and the non-central t-distribution
    • and power versus sample size. 2.11 Null and alternative distributions for another wheat example,
  • 2.12 Normal scores plots.
  • 3.1 Bacteria data
  • 3.2 Randomization distribution of the F -statistic
  • 3.3 Coagulation data
  • 3.4 F-distributions
  • 3.5 Normal-theory and randomization distributions of the F -statistic
  • 3.6 Power
  • 3.7 Power
  • 3.8 Yield-density data
  • 3.9 Normal scores plots of normal samples, with n ∈ { 20 , 50 , 100 }
  • 3.10 Crab data
  • 3.11 Crab residuals
  • 3.12 Fitted values versus residuals
  • 3.13 Data and log data
  • 3.14 Diagnostics after the log transformation
  • 3.15 Mean variance relationship of the transformed data LIST OF FIGURES iv
  • 4.1 Marginal Plots.
  • 4.2 Conditional Plots.
  • 4.3 Cell plots.
  • 4.4 Mean-variance relationship.
  • 4.5 Mean-variance relationship for transformed data.
  • 4.6 Plots of transformed poison data
    • ery. 4.7 Comparison between types I and II, without respect to deliv-
  • 4.8 Comparison between types I and II, with delivery in color.
  • 4.9 Marginal plots of the data.
  • 4.10 Three datasets exhibiting non-additive effects.
  • 4.11 Experimental material in need of blocking.
  • 4.12 Results of the experiment
  • 4.13 Marginal plots and residuals
  • 4.14 Marginal plots for pain data
  • 4.15 Interaction plots for pain data
  • 4.16 Oxygen uptake data
  • 4.17 ANOVA and ANCOVA fits to the oxygen uptake data
  • 5.1 Potato data.
  • 5.2 Diagnostic plots for potato ANOVA.
  • 5.3 Potato data
  • 5.4 Potato data

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 2

Input variables consist of

controllable factors: measured and determined by scientist

uncontrollable factors: measured but not determined by scientist

noise factors: unmeasured, uncontrolled factors (experimental variability or “error”)

For any interesting process, there are inputs such that:

variability in input → variability in output

If variability in an input factor x leads to variability in output y, we say x is a source of variation. In this class we will discuss methods of designing and analyzing experiments to determine important sources of variation.

1.3 Experiments and observational studies

Information on how inputs affect output can be gained from:

  • Observational studies: Input and output variables are observed from a pre-existing population. It may be hard to say what is input and what is output.
  • Controlled experiments: (some) Input variables are controlled and ma- nipulated by the experimenter to determine their effect on the output.

Example (Women’s Health Initiative, WHI):

  • Population: Healthy, post-menopausal women in the U.S.
  • Input variables:
    1. estrogen treatment, yes/no
    2. demographic variables (age, race, diet, family history,... )
    3. unmeasured variables (?)
  • Output variables
    1. coronary heart disease (eg. MI)

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 3

  1. invasive breast cancer 3....
  • Scientific question: How does estrogen treatment affect health out- comes?

Observational Study:

  1. Observational population: 93,676 women enlisted starting in 1991, tracked over eight years on average. Data consists of x= input variables, y=health outcomes, gathered concurrently on existing populations.
  2. Results: good health/low rates of CHD generally associated with estro- gen treatment.
  3. Conclusion: Estrogen treatment is positively associated with health out- comes, such as prevalence of CHD.

Experimental Study (WHI randomized controlled trial):

  1. Experimental population:

373,092 women determined to be eligible ↪→ 18,845 provided consent to be in experiment ↪→ 16,608 included in the experiment

16,608 women randomized to either

x = 1 (estrogen treatment) x = 0 (control, i.e. no estrogen treatment) using a randomized block design: Women were treated at different clinics, and were of different ages. age group 1 (50-59) 2 (60-69) 3 (70-79) clinic 1 n 11 n 12 n 13 2 n 21 n 22 n 23 .. .

ni,j = # of women in study, in clinic i and in age group j = # of women in block i, j

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 5

Observational study

correlation

.

X

X1 Y

cause cause

Randomized experiment

randomization

..............^ ... ................

X

X1 Y

Observational studies can suggest good experiments to run, but can’t definitively show causation.

Randomization can eliminate correlation between x 1 and y due to a different cause x 2 , aka a confounder.

“No causation without randomization”

CHAPTER 1. RESEARCH DESIGN PRINCIPLES 6

1.4 Steps in designing an experiment

  1. Identify research hypotheses to be tested.
  2. Choose a set of experimental units, which are the units to which treatments will be randomized.
  3. Choose a response/output variable.
  4. Determine potential sources of variation in response:

(a) factors of interest (b) nuisance factors

  1. Decide which variables to measure and control:

(a) treatment variables (b) potential large sources of variation/blocking variables

  1. Decide on the experimental procedure and how treatments are to be randomly assigned.

These factors are often constrained by budgets, ethics, time,...

Three principles in Experimental Design

  1. Replication: Repetition of an experiment. Replicates are runs of an experiment or sets of experimental units that have the same values of the control variables. More replication → more precise inference Let yA,i = response of the ith unit assigned to treatment A yB,i = response of the ith unit assigned to treatment B i = 1,... , n. Then ¯yA 6 = ¯yB provides evidence that treatment affects response, i.e. treatment is a source of variation. ( larger n → more evidence ).
  2. Randomization: Random assignment of treatments to experimental units. This removes potential for systematic bias/ removes any pre-experimental source of bias. Makes confounding unlikely.

Chapter 2

Comparing Two Treatments

Example: Wheat yield

Factor of interest: Fertilizer type, A or B. One factor of interest, having two levels.

Question: Is one fertilizer better than another, in terms of yield?

Experimental material: One plot of land to be divided into 2 rows of 6 subplots.

  1. Design question: How to assign treatments/factor levels to the plots? Want to avoid confounding a treatment effect with another potential source of variation.
  2. Potential sources of variation: Fertilizer, soil, sun, water.
  3. Implementation: If we assign treatments randomly, we can avoid any pre-experimental bias in results: 12 playing cards, 6 red, 6 black were shuffled and dealt. 1st card red → 1st plot gets A 2nd card red → 2nd plot gets A 3rd card black → 3rd plot gets B .. . This is our first design, a completely randomized design.

CHAPTER 2. COMPARING TWO TREATMENTS 9

  1. Results:

A A B B A B 26.9 11.4 26.6 23.7 25.3 28. B B A A B A 14.2 17.9 16.5 21.1 24.3 19.

How much evidence is there that fertilizer type is a source of yield variation? Evidence about differences between two populations is generally measured by comparing summary statistics across the two sample populations. (Recall, a statistic is any computable function of known, observed data).

2.1 Summaries of sample populations

Distribution:

  • Empirical distribution: Pr(ˆ a, b] = #(a < yi ≤ b)/n
  • Empirical CDF (cumulative distribution function)

Fˆ (y) = #(yi ≤ y)/n = Pr(ˆ −∞, y]

  • Histograms
  • Kernel density estimates

Note that these summaries more or less retain all the information in the data except the unit labels.

Location:

  • sample mean or average : ¯y = (^1) n

∑n i=1 yi

  • sample median : ˆq(1/2) is a/the value y(1/2) such that

#(yi ≤ y(1/2))/n ≥ 1 / 2 #(yi ≥ y(1/2))/n ≥ 1 / 2

To find the median, sort the data in increasing order, and call these values y(1),... , y(n). If there are no ties, then if n is odd, then y( n+1 2 ) is the median; if n is even, then all numbers between y( n 2 ) and y( n+1 2 ) are medians.

CHAPTER 2. COMPARING TWO TREATMENTS 11

mean(yA) [1] 20. mean(yB) [1] 22.

median(yA) [1] 20. median(yB) [1] 24

sd(yA) [1] 5. sd(yB) [1] 5.

quantile(yA,prob=c(.25,.75)) 25% 75% 17.275 24. quantile(yB,prob=c(.25,.75)) 25% 75% 19.350 26.

So there is a different in yield for these wheat fields. Would you recommend B over A for future plantings? Do you think these results generalize to a larger population?

2.2 Hypothesis testing via randomization

Questions:

  • Could the observed differences be due to fertilizer type?
  • Could the observed differences be due to plot-to-plot variation?

Hypothesis tests:

  • H 0 (null hypothesis): Fertilizer type does not affect yield.

CHAPTER 2. COMPARING TWO TREATMENTS 12

  • H 1 (alternative hypothesis): Fertilizer type does affect yield.

A statistical hypothesis test evaluates the plausibility of H 0 in light of the data.

Suppose we are interested in mean wheat yields. We can evaluate H 0 by answering the following questions:

  • Is a mean difference of 2.4 plausible/probable if H 0 is true?
  • Is a mean difference of 2.4 large compared to experimental noise?

To answer the above, we need to compare

{|¯yB − y¯A| = 2. 4 }, the observed difference in the experiment to values of |y¯B − y¯A| that could have been observed if H 0 were true.

Hypothetical values of |y¯B − ¯yA| that could have been observed under H 0 are referred to as samples from the null distribution.

Finding a null distribution: Let

g(YA, YB ) = g({Y 1 ,A,... , Y 6 ,A}, {Y 1 ,B ,... , Y 6 ,B }) = | Y¯B − Y¯B |.

This is a function of the outcome of the experiment. It is a statistic. Since we will use it to perform a hypothesis test, we will call it a test statistic.

Observed test statistic: g(26. 9 , 11. 4 ,... , 24 .3) = 2.4 = gobs

Hypothesis testing procedure: Compare gobs to g(YA, YB) for values of YA and YB that could have been observed, if H 0 were true.

Recall the outcome of the experiment:

  1. Cards were shuffled and dealt R, R, B, B,... and fertilizer types planted in subplots:

A A B B A B

B B A A B A

CHAPTER 2. COMPARING TWO TREATMENTS 14

IDEA: To consider what types of outcomes we would see in universes where H 0 is true, compute g(YA, YB ) under every possible treatment assignment and assuming H 0 is true.

Under our randomization scheme, there were

12! 6!6!

equally likely ways the treatments could have been assigned. For each one of these, we can calculate the value of the test statistic that would’ve been observed under H 0 : {g 1 , g 2 ,... , g 924 }

This enumerates all potential pre-randomization outcomes of our test statistic, assuming no treatment effect. Along with the fact that each treatment assignment is equally likely, these value give a null distribution, a probability distribution of possible experimental results, if H 0 is true.

Pr(g(YA, YB ) ≤ x|H 0 ) =

#{gk ≤ x} 924

This distribution is sometimes called the randomization distribution, be- cause it is obtained by the randomization scheme of the experiment. Is there any contradiction between H 0 and our data?

Pr(g(YA, YB ) ≥ 2. 4 |H 0 ) = 0. 47

According to this calculation, the probability of observing a mean difference of 2.4 or more is not unlikely under the null hypothesis. This probability calculation is called a p-value. Generically, a p-value is

“The probability, under the null hypothesis, of obtaining a result as or more extreme than the observed result.”

The basic idea:

small p-value → evidence against H 0 large p-value → no evidence against H 0

CHAPTER 2. COMPARING TWO TREATMENTS 15

YB − Y A

Density

|YB − Y A|

Density

Figure 2.2: Randomization distribution for the wheat example

Approximating a randomization distribution We don’t want to have to enumerate all

( (^) n n/ 2

possible treatment assignments. Instead, repeat the following Nsim times:

(a) randomly simulate a treatment assignment from the population of pos- sible treatment assignments, under the randomization scheme.

(b) compute the value of the test statistic, given the simulated treatment assignment and under H 0.

The empirical distribution of {g 1 ,... , gNsim} approximates the null dis- tribution :

#(|gk|) ≥ 2 .4) Nsim

≈ Pr(g(YA, YB ) ≥ 2. 4 |H 0 )

The approximation improves if Nsim increased. Here is some R-code:

y<- c( 26.9,11.4,26.6,23.7,25.3,28.5,14.2,17.9,16.5,21.1,24.3,19.6) x<- c("A","A","B","B","A","B","B","B","A","A","B","A")