Completely Randomized Design - Lecture Notes | STAT 507 | Study notes Statistics

1 Completely Randomized Design (CRD) Part I

It is the simplest of designs, but even planning for it requires scientific and statistical decisions (Acid rain

example).

1.1 Exploratory Data Analysis (EDA)

Essential for any statistical analysis. For the CRD we often use boxplots for initial examination of the data.

From the boxplot we can look for evidence of a treatment difference, and possible outliers or problems with

homogeneity of variance.

1.2 ANOVA as a choice of the best fitting model for the mean

Let yij (i= 1, ..., g;j= 1, ..., ni) be the jth observation in group i. In ANOVA from a CRD we consider two

models for yij . The first model, yij =µi+εij , specifies that each group has a different mean value µi. This

is also called the full model for yij . The second model, yij =µ+εij , specifies that all groups have identical

mean values µ. This is also called the reduced model for yij . Note that the reduced model is a special

case (or subset) of the full model. Both models make the assumption that the εij are independent, have

mean zero, and variance σ2. To conduct statistical inference (tests, confidence intervals, etc.) we make the

further assumption that the εij have a normal distribution.

An alternative way to express the models above is by letting µi=µ∗+αi, where µ∗is the overall mean

and αiis the treatment effect of group i. Then the full model is yij =µ∗+αi+εij . This new formulation

of the full model generalizes well to more complicated models, but introduces a complication because there

are now more parameters than groups. For both the full and reduced models, we can develop estimators

for the parameters µand σ2(reduced) or µ, µi, αi,and σ2(full), shown in Display 3.1 in the text. We can

also calculate confidence intervals for our parameters.

To choose between the full and reduced models, we compare their sum of squared residuals (SSR). A

residual ris the error in predicting an observation, r=y−by, where byis the predicted value of y. For the full

model, byij =yi·(the sample group mean), and for the reduced model, byij =y·· (the sample overall mean).

SSR for the full model can never exceed SSR for the reduced model, so we wish to decide if SSR for the full

model has been reduced enough to account for the extra parameters (µi) in the full model.

1.3 Analysis of Variance mechanics

Analysis of variance involves a partition of the total sum of squares for the observations yij. Using our

notation from above, yij −y·· = (yij −yi·)+(yi·

−y··), which equals the residual (from the full model) plus

the ith treatment effect, or rij +bαi.By squaring and summing these terms and cancelling the cross product

we obtain:

i=1

j=1

(yij −y··)2=

i=1

j=1

(yij −yi·)2+

i=1

j=1

(yi·

−y··)2,or SST= SSE+SST rt .

As an example, consider three groups with the following data: Group 1 has y1jvalues of 1, 2, and 3,

Group 2 has y2jvalues of 5, 3, and 4, and Group 3 has y3jvalues of 6, 7, and 5. The overall sample mean

is y·· = (Pg

i=1 P3

j=1 yij)/3g= 36/9 = 4.Then SSTis

Completely Randomized Design - Lecture Notes | STAT 507, Study notes of Statistics

Related documents

Partial preview of the text

Download Completely Randomized Design - Lecture Notes | STAT 507 and more Study notes Statistics in PDF only on Docsity!

1 Completely Randomized Design (CRD) Part I

1.1 Exploratory Data Analysis (EDA)

1.2 ANOVA as a choice of the best fitting model for the mean

1.3 Analysis of Variance mechanics

SSE =