F Distribution and Analysis of Variance (ANOVA), Slides of Design

The F distribution and its application in the Analysis of Variance (ANOVA) to test the equality of variances between two or more populations. It covers the concept of degrees of freedom, the formula for calculating the F statistic, and the interpretation of the results. The document also discusses the importance of homogeneity of variances and the use of Levene's test to check for it.

Typology: Slides

2021/2022

Uploaded on 09/12/2022

youcangetme
youcangetme 🇬🇧

5

(4)

214 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Topic 3. Single factor ANOVA: Introduction [ST&D Chapter 7]
"The analysis of variance is more than a technique for statistical analysis. Once it is understood,
ANOVA is a tool that can provide an insight into the nature of variation of natural events"
Sokal & Rohlf (1995), BIOMETRY.
3.1. The F distribution [ST&D p. 99]
Assume that you are sampling at random from a normally distributed population (or from two
different populations with equal variance) by first sampling n1 items and calculating their
variance s21 (df: n1 - 1), followed by sampling n2 items and calculating their variance s22 (df: n2 -
1). Now consider the ratio of these two sample variances:
2
2
2
1
s
s
This ratio will be close to 1, because these variances are estimates of the same quantity. The
expected distribution of this statistic is called the F-distribution. The F-distribution is
determined by two values for degrees of freedom, one for each sample variance. Statistical
Tables for F (e.g. A6 in your book) show the cumulative probability distribution of F for several
selected probability values. The values in the table represent F[1, 2] where is the proportion
of the F-distribution to the right of the given- F-value (in one tail) and 1, 2 are the degrees of
freedom pertaining to the numerator and denominator of the variance ratio, respectively.
4321
1. 0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
F
F(1,40)
F(28,6)
F(6,28)
Figure 1 Three representative F-distributions (note similarity of F(1,40) to 2
1).
For example, a value F/2=0.025, 1=9, 2= 9] = 4.03 indicates that the ratio s21 / s22 , from samples of
ten individuals from normally distributed populations with equal variance, is expected to be
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download F Distribution and Analysis of Variance (ANOVA) and more Slides Design in PDF only on Docsity!

Topic 3. Single factor ANOVA: Introduction [ST&D Chapter 7]

"The analysis of variance is more than a technique for statistical analysis. Once it is understood, ANOVA is a tool that can provide an insight into the nature of variation of natural events" Sokal & Rohlf (1995), BIOMETRY.

3.1. The F distribution [ST&D p. 99]

Assume that you are sampling at random from a normally distributed population (or from two different populations with equal variance) by first sampling n 1 items and calculating their variance s 21 (df: n 1 - 1), followed by sampling n 2 items and calculating their variance s 22 (df: n 2 - 1). Now consider the ratio of these two sample variances:

2 2

2 1 s

s

This ratio will be close to 1, because these variances are estimates of the same quantity. The expected distribution of this statistic is called the F-distribution. The F-distribution is determined by two values for degrees of freedom, one for each sample variance. Statistical Tables for F (e.g. A6 in your book) show the cumulative probability distribution of F for several selected probability values. The values in the table represent F [1, 2] where  is the proportion of the F-distribution to the right of the given- F-value (in one tail) and  1 ,  2 are the degrees of freedom pertaining to the numerator and denominator of the variance ratio, respectively.

1 2 3 4

F

F(1,40)

F(28,6)

F(6,28)

Figure 1 Three representative F-distributions (note similarity of F(1,40) to ^21 ).

For example, a value F /2=0.025, 1=9, 2= 9] = 4.03 indicates that the ratio s 21 / s 22 , from samples of ten individuals from normally distributed populations with equal variance, is expected to be

larger than 4.03 by chance in only 5% of the experiments (the alternative hypothesis is s 21  s (^22) so it is a two tail test ).

3. 2. Testing the hypothesis of equality of variances [ST&D 116-118]

Suppose X 1 ,..., Xm are observations drawn from a normal distribution with mean μX and variance σx^2 ; and Y 1 , ..., Yn are drawn from a normal distribution with mean μv and variance σy^2.

The F statistic can be used as a test for the hypothesis H 0 : σx^2 = σy^2 vs. the hypothesis H 1 : σx^2 

σy^2. H 0 is rejected at the  level of significance if the ratio sx^2 = s (^) y^2 is either  F /2, dfX-1, dfY-1 or  F1-/2, dfX-1, dfY-1. In practice, this test is rarely used because it is very sensitive to departures from normality. This can be calculated using SAS PROC TTEST.

3. 3. Testing the hypothesis of equality of two means [ST&D 98-112]

The ratio between two estimates of ^2 can be used to test differences between means, that is, a test of H 0 :  1 -  2 = 0 versus H 1 :  1 -  2  0.

In particular:

The denominator is an estimate of ^2 from the individuals within each sample. That is, it is a weighted average of the sample variances.

The numerator is an estimate of ^2 provided by the variation among sample means. To obtain

this estimate of ^2 (variance among individuals) from the mean variance ( Y n

2  2   ), 2

 Y needs

to be multiplied by n.

2

2 2

2

s

ns s

s F Y within

among  

When the two populations have different means (but same variance), the estimate of ^2 based on sample means will include a contribution attributable to the difference between population means as well as any random difference (i.e. within-population variance). Thus, if there is a significant difference among means, the sample means are expected to be more variable than when chance alone operates and there are no significant differences among means.

F =

estimate of σ^2 from sample means

estimate of σ^2 from individuals

a normal distribution with mean μ and variance σ^2 is itself a normally distributed random variable with mean μ and variance σ 2 /r.

The formula for sY^2 is

s

Y Y

Y t

i i

t

2

2 1 1

 (^.^ ..) = [(17 - 18.5) 2 + (20 - 18.5) 2 ] / (2-1) = 4.

and, from the central limit theorem, n times sY^2 provides an estimate for σ

2 (n is the number of

variates on which each sample mean is based).

Therefore, the between samples estimate is:

s^2 br sY^2 = 5 * 4.5 = 22.

These two variances are used in the F test as follows. If the null hypothesis is not true, then the between samples variance should be much larger than the within samples variance ("much larger" means larger than one would expect by chance alone). Therefore, we look at the ratio of these variances and ask whether this ratio is significantly greater than 1. It turns out that under our assumptions (normality, equal variance, etc.), this ratio is distributed according to an F(t-1, t(r-

1)) distribution. That is, we define:

F = s (^) b

2 / s (^) w

2

and test whether this statistic is significantly greater than 1. The F statistics measures how many times larger is the variability between the samples compared with the variability within samples.

In this example, F = 22.5/5.25 = 4.29. The numerator s b^2 is based on 1 df, since there are two sample means. The denominator, s w^2 , is based on pooling the df within each sample so df sw^2 = t(r-1) = 2(4) = 8. For these df, we would expect an F value of 4.29 or larger just by chance about 7% of the time. From Table A.6 (p.614 of ST&D), F0.05, 1, 8 = 5.32. Since 4.29 < 5.32, we fail to

reject H 0 at the 0.05 significance level.

3.3.1 Relationship between F and t

In the case of only two treatments, the square-root of the F statistic is distributed according to a t distribution: 2 , ( 1 ) 2

1  ,  1 ,( 1 ) 1     df tr df tr Ft

meaning t

s s

b w

2 2

In the example above, with 5 reps per treatment:

F(1,8), 1 -  = (t (^) 5, 1-/2 ) 2 (be careful: F uses  and t /2)

The total degrees of freedom for the t statistic is t( r - 1) = t r -t = n -t since there are n observations and they must satisfy t constraint equations, one for each treatment mean. Therefore, we reject the null hypothesis at the  significance level if t > t/2, t(r-1).

Here are the computations for our data set:

2

2    w

b s

s t

Since 2.07 < t (^) 0.025, 8 = 2.306, we fail to reject H 0 at the 0.05 significance level. The value 2.306 is

obtained from Table A3 (p.611) and df= 2(5-1)= 8. Note that 2.306^2 = 5.32= F 0.05, 1, 8.

3.4. The linear additive model [ST&D p. 32, 103, 152]

3.4.1. One population: In statistics, a common model describing the makeup of an observation states that it consists of a mean plus an error. This is a linear additive model. A minimum assumption is that the errors are random, making the model probabilistic rather than deterministic.

The simplest linear additive model:

Yi =  +  i

This model is applicable to the problem of estimating or making inferences about population means and variances. This model attempts to explain an observation Y (^) i as a mean  plus a random element of variation  i. The i 's are assumed to be from a population of uncorrelated 's

with mean zero. Independence among 's is assured by random sampling.

3.4.2. Two populations:

This second model is more general than the previous model (3.4.1) because it permits us to describe two populations simultaneously:

Yij =  +  i +  ij

For samples from two populations with possibly different means but a common variance , any given observation is composed of:

and the alternative as H 1 : at least one i  0.

What a Model I anova tests is the differential effects of treatments that are fixed and determined by the experimenter. The word "fixed" refers to the fact that each treatment is assumed to always have the same effect i. The 's are assumed to constitute a finite population and are the

parameters of interest, along with s 2. When the null hypothesis is false (and some i  0), there

will be an additional component of variation due to treatment effects equal to:

  1

2

t

r i

Since the i are measured as deviations from a mean, this quantity is analogous to a variance but cannot be called such since it is not based on a random variable but rather on deliberately chosen treatments.

The Model II ANOVA or random model : In this model, the added effects for each group ('s) are not fixed treatments but are random effects. In this case, we have not deliberately planned or fixed the treatment for any group, and the effects on each group are random and only partly under our control. The 's are a random sample from a population of 's for which the mean is zero and the variance is ^2 t. When the null hypothesis is false, there will be an additional component of variance equal to r^2 t. Since the effects are random, it is futile to estimate the magnitude of these random effects for any one group, or the differences from group to group. However, we can estimate their variance, the added variance component among groups: ^2 t. We test for its presence and estimate its magnitude , as well as its percentage contribution to the variation (calculated in SAS with PROC VARCOMP). The null hypothesis in the random model is stated as

H 0 : ^2 t = 0 versus H 1 : ^2 t ≠ 0.

An important point is that the basic setup of data, as well as the computation and significance test, in most cases is the same for both models. The purpose differs between the two models, as do some of the supplementary tests and computations following the initial significance test. In the fixed model , we draw inferences about particular treatments ; in the random model , we draw an inference about the population of treatments.

Until Topic 10, we will deal only with the fixed model.

Assumptions of the model [ST&D p.174]

  1. Treatment and environmental effects are additive
  2. Experimental errors are random, independently and normally distributed about zero mean and with a common variance.

Effects are additive This means that all effects in the model (treatment effects, random error) cause deviations from the overall mean in an additive manner (rather than, for example, multiplicative).

Error terms are independently and normally distributed

This means there is no correlation between experimental groupings of observations (e.g. by treatment level) and the sizes of the error terms. This could be violated if, for example, treatments are not assigned randomly.

Variances are homogeneous

This assumption means that the variances of the different treatment groups are the same. This assumption means that the means and variances of treatments share no correlation, that is, that treatments with larger means do not have larger variances.

We need this assumption since we are calculating an overall sample variance, by averaging the variances of the different treatments.

There are alternative statistical analyses when the variances are not homogeneous (e.g. Welch's variance-weighted one-way ANOVA)

3.5. ANOVA: Single factor designs

3.5.1. The Completely Random Design CRD

In single factor experiments, a single factor is varied to form the different treatments. The experiment shown below is taken from page 141 of ST&D. The experiment involves inoculating five different cultures of one legume, clover, with strains of the nitrogen-fixing bacteria from another legume, alfalfa. As a sort of control, a sixth trial was run in which a composite of the five clover cultures was inoculated. There are 6 treatments (t = 6) and each treatment is given 5 r eplications ( r = 5).

Table 1. Inoculation of clover with Rhizobium strains [ST&D Table 7.1]

3DOK1 3DOK5 3DOK4 3DOK7 3DOK13 composite Total 19.4 17.7 17.0 20.7 14.3 17. 32.6 24.8 19.4 21.0 14.4 19. 27.0 27.9 9.1 20.5 11.8 19. 32.1 25.2 11.9 18.8 11.6 16. 33.0 24.3 15.8 18.6 14.2 20. Y (^) ij= Y (^) i. 144.1^ 119.9^ 73.2^ 99.6^ 66.3^ 93.5^ 596.6= Y^ .. Y (^) ij^2 4287.53 2932.27 1139.42 1989.14 887.29 1758.71 12994. Y (^) i.^2 /r 4152.96 2875.2 1071.65 1984.03 879.14 1748.45 12711. (Y (^) ij - Y (^) i. ) 2 134.57^ 57.07^ 67.77^ 5.11^ 8.15^ 10.26^ 282. Y (^) i. = mean 28.8^ 24.0^ 14.6^ 19.9^ 13.3^ 18.7^ 19. ^2 n- variance

33.64 14.27 16.94 1.28 2.04 2.

The mean square for error: MSE = SSE/( n -t)) gives the average dispersion of the items

around their respective group means. The df is t times ( r -1), which is the df within each of the pooled treatments.

MSE is an estimate of a common ^2 , the experimental error (= within variation or variation among observations treated alike). MSE is a valid estimate of the common ^2 if the assumption of equal variances among treatments is true (because we are averaging the variance estimates from different treatments).

The mean square for treatment: MST = SST/(t-1). (MS Model in SAS) This is an

independent estimate of ^2 , when the null hypothesis is true (H 0 : μ 1 = μ 2 = μ 3 = ... = μ (^) t ).

If there are differences among treatment means, there will be an added component of variation due to treatment effects equal to r i^2 /(t-1) (Model I) or r ^2 t (Model II) (see topic 3.4.3 and ST&D 155). The multiplication by r is required to express the variance per individual, not per

mean (remember that Y r

2

 2   so 2 2

  r  Y ).

F = MST/MSE

The F value is obtained by dividing the treatment mean square by the error mean square. We expect to find F approximately equal to 1. In fact, however, the expected ratio is:

2

 ^

r t MSE

MST (^) i

It is clear from this formula, that the F -test is sensitive to the presence of the added component of variation due to treatment effects. In other words, the ANOVA permits us to test whether there are any added treatment effects. That is, to test whether a group of means can be considered random samples from the same population or whether we have sufficient evidence to conclude that the treatments that have affected each group separately have resulted in shifting these means sufficiently so that they can no longer be considered samples from the same population.

Recall that the number of degrees of freedom is the number of independent quantities in the statistic.

 Thus SST has the t quantities ( Y (^) i. - Y (^) .. ) which have one constraint (that they must sum to 0); so dftrt = t-1.  The SSE are n quantities Yij, which have t constraints for the t sample means; so dfe = t(r-1) = n-t.

We can also use the following equation:

    

t

i

ij i

r

j

t

i

i

t

i

ij

r

j

Y Y r Y Y Y Y 1

.^2 1 1 . ..^2 1

..^2 1

or

TSS = SST + SSE where TSS is the total sum of squares of the experiment

In other words, sums of squares are perfectly additive.

If you expand the quantity on the left-hand side of the above equation in our dot notation, there is a cross product terms of the form 2( Y (^) ij Y (^) .. ) that should appear. It turns out, that all of these cross product terms cancel each other out. Quantities that satisfy this are said to be orthogonal. Another way of saying this is that we can decompose the total SS into a portion due to variation among groups and another independent portion due to variation within groups. The degrees of freedom are also additive (i.e. dfTot = dfTrt + dfe).

The dot notation above provides the "definition" formulas for of these quantities (TSS, SST, and SSE). But each also has a friendlier "calculation" form to compute them by hand.

The actual calculations, when done by hand, use the formulas

SSE TSS SST

SST Y r C

TSS Y C

C Y n Y n

t

i

i

t

i

ij

r

j

ij

ij

 

1

2 .

1

2 1

2 2 ..

( ) / ( ) / The correction term (C). Is the squared sum of all

observations divided by their number.

The total sum of squares that includes all sources of variation. This is the total SS.

The sum of squares attributable to the variable of classification. This is the between SS, or among groups SS or treatment SS.

The sum of squares among individuals treated alike. This is the within groups SS, or residual SS or error SS. It is easier to calculate as a difference

An ANOVA table provides a systematic presentation of everything we've covered until now. The first column of the ANOVA table specifies the components of the linear model. The next column indicates the df associated with each of these components. Next is a column with the SS associated with each, followed by a column with the corresponding mean squares.

Mean squares are essentially variances; and they are found by dividing SS by their respective df. Finally, the last column in an ANOVA table below presents the F statistic, which is a ratio of mean squares (i.e. a ratio of variances). Usually, a last column is added indicating the probability of finding that F values by chance. An ANOVA table (including an additional column of the SS definitional forms):

3.5.1.2.1. Normal distribution

Recall from the first lecture that the Shapiro-Wilk test statistic W (ST&D 567; produced by SAS via Proc UNIVARIATE NORMAL) provides a powerful test for normality for small to medium samples (n < 2000). Normality is rejected if W is sufficiently smaller than 1. W is similar to a correlation between the data and their normal scores (ST&D 566). In a perfectly normal population there is a perfect correlation W=1.

For large populations (n>2000), SAS recommends the use of the Kolmogorov-Smirnov statistics (ST&D 571; also produced by SAS via Proc UNIVARIATE NORMAL or via Analyst).

Both tests are applied to the residuals of the model, which are easy to calculate in SAS or R.

3.5.1.2.2. Homogeneity of variances

Tests for homogeneity of variance (i.e. homoscedasticity) attempt to determine if the variance is the same within each of the groups defined by the independent variable. Bartlett's test (ST&D

  1. can be very inaccurate if the underlying distribution is even slightly nonnormal, and it is not recommended for routine use. Levene's test is more robust to deviations from normality, and will be used in this class.

Levene’s test is an ANOVA of the squares of the residuals of each observation from its treatment mean. An alternative form of the test, implemented in R, uses the absolute values of the deviation from the treatment median.

To perform Levene’s test in SAS, you need to use the option HOVTEST (for Homogeneity of variance test) within the means statement in the PROC GLM procedure:

proc GLM ; Class Treatment; Model Response = Treatment; Means Treatment / Hovtest = Levene ;

If Levene's test rejects the hypothesis of homogeneity of variances there are three alternatives:

  1. Transform the data (e.g. logarithm) so that the transformed values have uniform variances.
  2. Use a non parametric statistical test.
  3. Use the WELCH option which produces a Welch's variance-weighted ANOVA (Biometrika 1951 v38, 330) instead of the usual ANOVA. This alternative to the usual analysis of variance is more robust if variances are not equal.

proc GLM ; Class Treatment; Model Response = Treatment; Means Treatment / Welch ;

3.5.1.3. Experimental Procedure: Randomization

Here is how the clover plots might look if this experiment were conducted in the field:

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15 16 17 18

19 20 21 22 23 24

25 26 27 28 29 30

The experimental procedure would be: First, randomly (e.g. from a random number table such as ST&D 606, or using PROC PLAN in SAS, etc.) select the plot numbers to be assigned to the six treatments (A, B, C, D, E, F). Example : On p. 607, starting from Row 02, columns 88-89 (a random starting point), move downward. Take for treatment A the first 5 random numbers under 30, and so forth (without replication): Treatment A: 5, 19, 13, 20, 6; Treatment B: 14, 26, 1, 8, 4; etc. Or simply write 30 numbers, mixed and randomly assigned 5 numbers to each treatment…

B 2 3 B A A

7 B 9 10 11 12

A B 15 16 17 18

A A 21 22 23 24

25 B 27 28 29 30

3.5.1.4. Power and sample size

Pearson and Hartley (1953, Biometrika 38:112-130) provided power function charts that are easy to use to calculate the power of an ANOVA and the appropriate number of replications. The Tables are available in the class website at http://www.plantsciences.ucdavis.edu/agr205/Lectures/2010%20Iago/Topic%203/PowerCharts.pdf

There are different charts for each different numerator degrees of freedom  1.

3.5.1.4.1. Power

The power of a test is the probability of detecting a nonzero treatment effect. To calculate the power of the F test in an ANOVA using Pearson and Hartley's power function charts, it is necessary to calculate first a critical value . This critical value depends on the number of treatments (t), the number of replications (n), the magnitude of the treatment effects that the

case, the power is slightly greater than 0.55. Experiments should be designed to have a power of at least 80% (i.e. β ≤ 0.20).

To calculate the power using Analyst: Statistics ANOVA  One-Way ANOVA  Tests  Power analysis. Or Analyst  Sample Size  One-Way ANOVA  Complete the number of treatments, the corrected sum of squares CSS (= SST = between SS = among groups SS = treatment SS), and the standard deviation, which is the square root of the mean squared error (MSE). You must also specify the significance level of the test; the default is 0.05.

3. 5. 1. 4. 2. Sample size

To calculate the number of replications for a given  and desired power:  Specify the constants.  Start with an arbitrary r to compute .  Use the appropriate Pearson and Hartley chart to find the power.  Iterate the process until a minimum r value which satisfies the required power for a given  level is found.

Example: Suppose that 6 treatments will be involved in a study and the anticipated difference between the extreme means is 15 units. What is the required sample size so that this difference will be detected at  = 1% and power = 90%, knowing that ^2 = 12? (note, t = 6,  = 1%,  = 10%, d = 15, and MSE = ^2 = 12).

r df  (1-) for =1% 2 6(2-1)= 6 1.77 0. 3 6(3-1)= 12 2.17 0. 4 6(4-1)= 18 2.50 0.

Thus 4 replications are required for each treatment to satisfy the required conditions.

3.5.2. Subsampling: the nested design [ST&D p. 157 - 167]

It may happen that the experimenter wishes to make several observations within each experimental unit , the unit to which the treatment is applied. Such observations are made on subsamples or sampling units.

The classical example of this is given in Steel and Torrie: sampling individual plants within pots where the pots are the experimental units randomly assigned to treatments. Other examples would be individual trees within an orchard plot (where the treatment is assigned to the plot), individual sheep within a herd (where the treatment is assigned to the herd), etc. We call the analysis of this kind of data organized in a hierarchical way nested analysis of variance. Nested ANOVAs are not limited to two hierarchical levels (e.g. pots, and then plants within pots). We can divide the subgroups into sub-subgroups, and even further, as long as the sampling units

within each level (e.g. pots, then plants within pots, then flowers within plants, etc.) are chosen randomly.

The essential objective of a nested ANOVAs is to dissect the MSE of a system into its components, thereby ascertaining the sources and magnitudes of error in an experiment or process.

Examples of applications of nested ANOVA are:

 To ascertain the magnitude of error at various stages of an experiment or process.

 To estimate the magnitude of the variance attributable to various levels of variation in a study of quantitative genetics

 To discover sources of variation in natural population in systematic studies, etc.

If you are confused in a nested design, you can always average the subsamples and perform a simpler ANOVA. This is also a good strategy to test if your nested design analysis is correct. The final P value will be the same with a correct nested design analysis and a non-nested design using the averages of the subsamples. The advantage of doing the more complex analysis including the subsamples is to calculate the different component of variance.

3.5.2.1. Linear model for subsampling

Before we compute a nested ANOVA, we should examine the linear model upon which it is based:

Yijk =  + i + j(i) + k(ij)

The interpretations of , , and  are as before. But now two random elements are obtained with each observation:

The  j(i) are assumed normal with mean 0 and variance ^2. The subscript  j(i) indicates that the j th level of replication is nested within the i th^ level of treatment. Note: this is a different notation from ST&D. The j(i) measures, as before, the variation among real replications within treatment groups, and the subindex indicates that there are subsamples within the replications. In the experiment where pots are randomized among treatments, and each pot includes 4 plants, j(i) measures the variation among pots means (averages of 4 plants) within a treatment.

The  k(ij) represents the errors associated with the variation among subsamples within an experimental unit. In the pot experiment k(ij) measures the variation among the 4 plants within each pot. The k(ij) are also assumed normal with mean 0 and variance ^2. This is represented in the sample data as:

Yi jk = Y (^) ... + ( Y (^) i.. - Y (^) ... ) + ( Y ij. - Y (^) i..) + ( Y ijk - Y (^) ij.)

Remember that in this notation the dot replaces a subscript and indicates that all values covered by that subscript have been added

Applying this formula to the pot experiment:

The two error terms represent the sum of squares due to experimental error and the sum of squares due to sampling error. In the pot experiment, SSEE represents the variation among pots within treatments and SSSE represents the variation among plants within pots.

Nested ANOVA table :

Source of variation df SS MS F Expected MS Treatments (τi ) t - 1 = 5^ SST SST / 5 MST / MSEE (^) ^2 + 4 ^2 + 12 ^2 / Exp. Error (εj(i)) t (r - 1) = 12^ SSEE SSEE / 12 MSEE / MSSE ^2 + 4 ^2 Samp. Error (δk(ij)) nt (s - 1) = 54^ SSSE SSSE / 54 (^) ^2

Total tns - 1 = 71^ TSS

In each case, the number of degrees of freedom is the product of the number of levels associated with each subscript between brackets and the number of levels minus one associated with the subscript outside the brackets.

The expected mean squares are the theoretical models of the variance components included in each MSE. The MSSE estimates ^2 (variation among plants), and the MSEE estimates both the variation between plants (^2 ) and the variation between pots (^2 ). The last one is multiplied by 4 because pots are means of 4 plants (^2 = ^2 /4) and to put everything in the same scale (^2 ) it needs to be multiplied. The treatment effects are based on treatment means calculated from 12 plants (4*3), and that is why it is multiplied by 12.

The most important part of this table:

In testing a hypothesis about treatment means, the appropriate divisor for F is the mean square experimental error ( MSEE ) since it includes the variation from all sources (pot and plant) that contribute to the variability of treatment means except the treatment effects themselves.

If you do not inform the statistical program the plants are subsamples, the program will automatically divide by the MSSE , and the P value will answer the question:

Is there a significant difference between treatments or pots? MST/MSSE->EMS= 4 ^2 + 12 ^2 /

instead of the one you thing you are answering which is:

Is there a different between treatments? MST/ MSEE ->EMS= 12 ^2 /

In a nested design the most critical part is the selection of the correct error term

Estimation of the different variance components in the pot experiment

The main objective in a nested design is to estimate the variance components. To do this, we deconstruct the calculated mean squares according to their underlying theoretical models or expected mean squares (EMS, last column in the table) for each component of the linear model, as shown below:

Variance Sum of Mean Variance Percent of Source df Squares Squares component total Total 71 255.91 3.60 4.05 100.0 % trtmt 5 179.64 35.92 2.81 69.4 % pot 12 25.83 2.15 0.30 7.5 % plant 54 40.43 0.93 0.93 23.0 %

MSSE = ^2 , so ^2 = 0. MSEE = ^2 + 4 ^2 , so ^2 = (MSEE - ^2 )/ 4 = ( 2.15 - 0.93)/ 4 = 0. MST = ^2 +4^2 + 12 ^2 /5 , so ^2 /5= (MST - MSEE)/ 12 = ( 35.92 - 2.15)/12 = 2.

In this example, the variation among plants within a pot is three times larger than the variation among pots within a treatment.

In SAS, PROC VARCOMP computes these variance components for different models. For our example experiment here:

Proc GLM ; Class Trtmt Pot; Model Growth = Trtmt Pot(Trtmt); Random Pot(Trtmt); Test h = Trtmt e = Pot(Trtmt); Proc Varcomp ; Class Trtmt Pot; Model Growth = Trtmt Pot(Trtmt);

Pot(Trtmt): indicates that pot is a nested factor in treatment. Pot 1 in treatment 1 is not more similar to pot 1 in treatment 2 than to pots 2 and 3.

Random Pot(Trtmt): This statement tells SAS that the pots are a random factor (i.e. pots 1, 2 and 3 are just a random sample, not a classification based on a common property).

Test h = Trtmt e = Pot(Trtmt): This statement tells SAS which error term to use to test a particular hypothesis. For the hypothesis about treatments ( h = Trtmt), the appropriate error term is the MSEE (i.e. Pot(trtmt)), so e = Pot(Trtmt). This specifies the test MST/MSEE.

Note that you do not include a class variable for the last level of sub-sampling (in this case, plant). By default, SAS will use this last level of variation (among plants within a pot) as the error term for the experiment. This is why the test statement is so important in a nested design: