Understanding F-statistic & Distribution in One-Way ANOVA for Discrete Quant. Variables, Lecture notes of Statistics

The concept of One-Way Analysis of Variance (ANOVA) for discrete quantitative variables with a limited range of possible values. It discusses the assumptions of normal distribution, equal variances, and random treatment assignment. The document also covers the calculation of mean square within-groups (MSwithin) and mean square between-groups (MSbetween), and the F-statistic. It explains how to interpret the F-distribution and its significance in determining population group means.

Typology: Lecture notes

2021/2022

Uploaded on 07/05/2022

carol_78
carol_78 🇦🇺

4.8

(59)

1K documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 7
One-way ANOVA
One-way ANOVA examines equality of population means for a quantitative out-
come and a single categorical explanatory variable with any number of levels.
The t-test of Chapter 6looks at quantitative outcomes with a categorical ex-
planatory variable that has only two levels. The one-way Analysis of Variance
(ANOVA) can be used for the case of a quantitative outcome with a categorical
explanatory variable that has two or more levels of treatment. The term one-
way, also called one-factor, indicates that there is a single explanatory variable
(“treatment”) with two or more levels, and only one level of treatment is applied
at any time for a given subject. In this chapter we assume that each subject is ex-
posed to only one treatment, in which case the treatment variable is being applied
“between-subjects”. For the alternative in which each subject is exposed to several
or all levels of treatment (at different times) we use the term “within-subjects”,
but that is covered Chapter 14. We use the term two-way or two-factor ANOVA,
when the levels of two different explanatory variables are being assigned, and each
subject is assigned to one level of each factor.
It is worth noting that the situation for which we can choose between one-way
ANOVA and an independent samples t-test is when the explanatory variable has
exactly two levels. In that case we always come to the same conclusions regardless
of which method we use.
The term “analysis of variance” is a bit of a misnomer. In ANOVA we use
variance-like quantities to study the equality or non-equality of population means.
So we are analyzing means, not variances. There are some unrelated methods,
171
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Understanding F-statistic & Distribution in One-Way ANOVA for Discrete Quant. Variables and more Lecture notes Statistics in PDF only on Docsity!

Chapter 7

One-way ANOVA

One-way ANOVA examines equality of population means for a quantitative out- come and a single categorical explanatory variable with any number of levels.

The t-test of Chapter 6 looks at quantitative outcomes with a categorical ex- planatory variable that has only two levels. The one-way Analysis of Variance (ANOVA) can be used for the case of a quantitative outcome with a categorical explanatory variable that has two or more levels of treatment. The term one- way, also called one-factor, indicates that there is a single explanatory variable (“treatment”) with two or more levels, and only one level of treatment is applied at any time for a given subject. In this chapter we assume that each subject is ex- posed to only one treatment, in which case the treatment variable is being applied “between-subjects”. For the alternative in which each subject is exposed to several or all levels of treatment (at different times) we use the term “within-subjects”, but that is covered Chapter 14. We use the term two-way or two-factor ANOVA, when the levels of two different explanatory variables are being assigned, and each subject is assigned to one level of each factor.

It is worth noting that the situation for which we can choose between one-way ANOVA and an independent samples t-test is when the explanatory variable has exactly two levels. In that case we always come to the same conclusions regardless of which method we use.

The term “analysis of variance” is a bit of a misnomer. In ANOVA we use variance-like quantities to study the equality or non-equality of population means. So we are analyzing means, not variances. There are some unrelated methods,

172 CHAPTER 7. ONE-WAY ANOVA

such as “variance component analysis” which have variances as the primary focus for inference.

7.1 Moral Sentiment Example

As an example of application of one-way ANOVA consider the research reported in “Moral sentiments and cooperation: Differential influences of shame and guilt” by de Hooge, Zeelenberg, and M. Breugelmans (Cognition & Emotion,21(5): 1025- 1042, 2007).

As background you need to know that there is a well-established theory of Social Value Orientations or SVO (see Wikipedia for a brief introduction and references). SVOs represent characteristics of people with regard to their basic motivations. In this study a questionnaire called the Triple Dominance Measure was used to categorize subjects into “proself” and “prosocial” orientations. In this chapter we will examine simulated data based on the results for the proself individuals.

The goal of the study was to investigate the effects of emotion on cooperation. The study was carried out using undergraduate economics and psychology students in the Netherlands.

The sole explanatory variable is “induced emotion”. This is a nominal cat- egorical variable with three levels: control, guilt and shame. Each subject was randomly assigned to one of the three levels of treatment. Guilt and shame were induced in the subjects by asking them to write about a personal experience where they experienced guilt or shame respectively. The control condition consisted of having the subject write about what they did on a recent weekday. (The validity of the emotion induction was tested by asking the subjects to rate how strongly they were feeling a variety of emotions towards the end of the experiment.)

After inducing one of the three emotions, the experimenters had the subjects participate in a one-round computer game that is designed to test cooperation. Each subject initially had ten coins, with each coin worth 0.50 Euros for the subject but 1 Euro for their “partner” who is presumably connected separately to the computer. The subjects were told that the partners also had ten coins, each worth 0.50 Euros for themselves but 1 Euro for the subject. The subjects decided how many coins to give to the interaction partner, without knowing how many coins the interaction partner would give. In this game, both participants would earn 10 Euros when both offered all coins to the interaction partner (the

174 CHAPTER 7. ONE-WAY ANOVA

Figure 7.1: Boxplots of cooperation by induced emotion.

side boxplots shown in figure 7.1.

Our initial impression is that cooperation is higher for guilt than either shame or the control condition. The mean cooperation for shame is slightly lower than for the control. In terms of pre-checking model assumptions, the boxplots show fairly symmetric distributions with fairly equal spread (as demonstrated by the comparative IQRs). We see four high outliers for the shame group, but careful thought suggests that this may be unimportant because they are just one unit of measurement (coin) into the outlier region and that region may be “pulled in’ a bit by the slightly narrower IQR of the shame group.

Induced

emo-

  • 7.1. MORAL SENTIMENT EXAMPLE
    • Cooperation Control Mean 3.49 0. tion Statistic Std.Error
    • score 95% Confidence Lower Bound 2. - Interval for Mean Upper Bound 4. - Median 3. - Std. Deviation 3. - Minimum - Maximum - Skewness 0.57 0. - Kurtosis -0.81 0.
      • Guilt Mean 5.38 0.
        • 95% Confidence Lower Bound 4.
        • Interval for Mean Upper Bound 6.
        • Median 6.
        • Std. Deviation 3.
        • Minimum
        • Maximum
        • Skewness -0.19 0.
        • Kurtosis -1.17 0.
      • Shame Mean 3.78 0.
        • 95% Confidence Lower Bound 2.
        • Interval for Mean Upper Bound 4.
        • Median 4.
        • Std. Deviation 2.
        • Minimum
        • Maximum
        • Skewness 0.71 0.
        • Kurtosis -0.20 0.

7.2. HOW ONE-WAY ANOVA WORKS 177

Technically, the sample group means are unbiased estimators of the population group means when treatment is randomly assigned. The mean- ing of unbiased here is that the true mean of the sampling distribution of any group sample mean equals the corresponding population mean. Fur- ther, under the Normality, independence and equal variance assumptions it is true that the sampling distribution of Y¯i is N (μi, σ^2 /ni), exactly.

The statistical model for which one-way ANOVA is appropriate is that the (quantitative) outcomes for each group are normally distributed with a common variance (σ^2 ). The errors (deviations of individual outcomes from the population group means) are assumed to be inde- pendent. The model places no restrictions on the population group means.

The term assumption in statistics refers to any specific part of a statistical model. For one-way ANOVA, the assumptions are normality, equal variance, and independence of errors. Correct assignment of individuals to groups is sometimes considered to be an implicit assumption.

The null hypothesis is a point hypothesis stating that “nothing interesting is happening.” For one-way ANOVA, we use H 0 : μ 1 = · · · = μk, which states that all of the population means are equal, without restricting what the common value is. The alternative must include everything else, which can be expressed as “at least one of the k population means differs from all of the others”. It is definitely wrong to use HA : μ 1 6 = · · · 6 = μk because some cases, such as μ 1 = 5, μ 2 = 5, μ 3 = 10, are neither covered by H 0 nor this incorrect HA. You can write the alternative hypothesis as “HA : Not μ 1 = · · · = μk or “the population means are not all equal”.

One way to correctly write HA mathematically is HA : ∃ i, j : μi 6 = μj.

This null hypothesis is called the “overall” null hypothesis and is the hypothesis tested by ANOVA, per se. If we have only two levels of our categorical explanatory

178 CHAPTER 7. ONE-WAY ANOVA

variable, then retaining or rejecting the overall null hypothesis, is all that needs to be done in terms of hypothesis testing. But if we have 3 or more levels (k ≥ 3), then we usually need to followup on rejection of the overall null hypothesis with more specific hypotheses to determine for which population group means we have evidence of a difference. This is called contrast testing and discussion of it will be delayed until chapter 13.

The overall null hypothesis for one-way ANOVA with k groups is H 0 : μ 1 = · · · = μk. The alternative hypothesis is that “the population means are not all equal”.

7.2.2 The F statistic (ratio)

The next step in standard inference is to select a statistic for which we can compute the null sampling distribution and that tends to fall in a different region for the alternative than the null hypothesis. For ANOVA, we use the “F-statistic”. The single formula for the F-statistic that is shown in most textbooks is quite complex and hard to understand. But we can build it up in small understandable steps.

Remember that a sample variance is calculated as SS/df where SS is “sum of squared deviations from the mean” and df is “degrees of freedom” (see page 69). In ANOVA we work with variances and also “variance-like quantities” which are not really the variance of anything, but are still calculated as SS/df. We will call all of these quantities mean squares or MS. i.e., M S = SS/df , which is a key formula that you should memorize. Note that these are not really means, because the denominator is the df, not n.

For one-way ANOVA we will work with two different MS values called “mean square within-groups”, MSwithin, and “mean square between-groups”, MSbetween. We know the general formula for any MS, so we really just need to find the formulas for SSwithin and SSbetween, and their corresponding df.

The F statistic denominator: MSwithin

MSwithin is a “pure” estimate of σ^2 that is unaffected by whether the null or alter- native hypothesis is true. Consider figure 7.2 which represents the within-group

180 CHAPTER 7. ONE-WAY ANOVA

where SSwithin =

∑k i=1 SSi, and dfwithin =^

∑k i=1 dfi^ =^

∑k i=1(ni−1) =^ N^ −k.

MSwithin is a good estimate of σ^2 (from our model) regardless of the truth of H 0. This is due to the way SSwithin is defined. SSwithin (and therefore MSwithin) has N-k degrees of freedom with ni − 1 coming from each of the k groups.

The F statistic numerator: MSbetween

0 20

Group 1

Group 2

Y^ ¯ 1 = 4. 25

Y¯ 2 = 14. 00

Y¯ = 9. 125

Figure 7.3: Deviations for between-group sum of squares

Now consider figure 7.3 which represents the between-group deviations used in the calculation of MSbetween for the same little 2-group 8-subject experiment as shown in figure 7.2. The single vertical black line is the average of all of the outcomes values in all of the treatment groups, usually called either the overall mean or the grand mean. The colored vertical lines are still the group means. The horizontal black lines are the deviations used for the between-group calculations. For each subject we get a deviation equal to the distance (difference) from that subject’s group mean to the overall (grand) mean. These deviations are squared and summed to get SSbetween, which is then divided by the between-group df, which is k − 1, to get MSbetween.

MSbetween is a good estimate of σ^2 only when the null hypothesis is true. In this case we expect the group means to be fairly close together and close to the

7.2. HOW ONE-WAY ANOVA WORKS 181

grand mean. When the alternate hypothesis is true, as in our current example, the group means are farther apart and the value of MSbetween tends to be larger than σ^2. (We sometimes write this as “MSbetween is an inflated estimate of σ^2 ”.)

SSbetween is the sum of the N squared between-group deviations, where the deviation is the same for all subjects in the same group. The formula is SSbetween =

∑^ k

i=

ni( Y¯i − Y¯¯ )^2

where Y¯¯ is the grand mean. Because the k unique deviations add up to zero, we are free to choose only k − 1 of them, and then the last one is fully determined by the others, which is why dfbetween = k − 1 for one-way ANOVA.

Because of the way SSbetween is defined, MSbetween is a good estimate of σ^2 only if H 0 is true. Otherwise it tends to be larger. SSbetween (and therefore MSbetween) has k − 1 degrees of freedom.

The F statistic ratio

It might seem that we only need MSbetween to distinguish the null from the alter- native hypothesis, but that ignores the fact that we don’t usually know the value of σ^2. So instead we look at the ratio

F =

MSbetween MSwithin

to evaluate the null hypothesis. Because the denominator is always (under null and alternative hypotheses) an estimate of σ^2 (i.e., tends to have a value near σ^2 ), and the numerator is either another estimate of σ^2 (under the null hypothesis) or is inflated (under the alternative hypothesis), it is clear that the (random) values of the F-statistic (from experiment to experiment) tend to fall around 1.0 when

7.2. HOW ONE-WAY ANOVA WORKS 183

0 1 2 3 4 5

F

Density

df=1, df=2, df=3, df=3,

Figure 7.4: A variety of F-distribution pdfs.

The F-distribution is a non-negative distribution in the sense that F values, which are squares, can never be negative numbers. The distribution is skewed to the right and continues to have some tiny probability no matter how large F gets. The mean of the distribution is s/(s − 2), where s is the denominator degrees of freedom. So if s is reasonably large then the mean is near 1.00, but if s is small, then the mean is larger (e.g., k=2, n=4 per group gives s=3+3=6, and a mean of 6/4=1.5).

Examples of F-distributions with different numerator and denominator degrees of freedom are shown in figure 7.4. These curves are probability density functions, so the regions on the x-axis where the curve is high are the values most likely to occur. And the area under the curve between any two F values is equal to the probability that a random variable following the given distribution will fall between those values. Although very low F values are more likely for, say, the

184 CHAPTER 7. ONE-WAY ANOVA

0 1 2 3 4 5

F

Density^ Observed F−statistic=2.

shaded area is 0.

Figure 7.5: The F(3,10) pdf and the p-value for F=2.0.

F(1,10) distribution than the F(3,10) distribution, very high values are also more common for the F(1,10) than the F(3,10) values, though this may be hard to see in the figure. The bigger the numerator and/or denominator df, the more concentrated the F values will be around 1.0.

7.2.4 Inference: hypothesis testing

There are two ways to use the null sampling distribution of F in one-way ANOVA: to calculate a p-value or to find the “critical value” (see below).

A close up of the F-distribution with 3 and 10 degrees of freedom is shown in figure 7.5. This is the appropriate null sampling distribution of an F-statistic for an experiment with a quantitative outcome and one categorical explanatory variable (factor) with k=4 levels (each subject gets one of four different possible treatments) and with 14 subjects divided among the 4 groups. A vertical line marks an F-statistic of 2.0 (the observed value from some experiment). The p- value for this result is the chance of getting an F-statistic greater than or equal to

186 CHAPTER 7. ONE-WAY ANOVA

7.2.5 Inference: confidence intervals

It is often worthwhile to express what we have learned from an experiment in terms of confidence intervals. In one-way ANOVA it is possible to make confidence intervals for population group means or for differences in pairs of population group means (or other more complex comparisons). We defer discussion of the latter to chapter 13.

Construction of a confidence interval for a population group means is usually done as an appropriate “plus or minus” amount around a sample group mean. We use MSwithin as an estimate of σ^2 , and then for group i, the standard error of the mean is

√ MSwithin/ni. As discussed in sec- tion 6.2.7, the multiplier for the standard error of the mean is the so called “quantile of the t-distribution” which defines a central area equal to the de- sired confidence level. This comes from a computer or table of t-quantiles. For a 95% CI this is often symbolized as t 0. 025 ,df where df is the degrees of freedom of MSwithin, (N − k). Construct the CI as the sample mean plus or minus (SEM times the multiplier).

In a nutshell: In one-way ANOVA we calculate the F-statistic as the ratio MSbetween/MSwithin. Then the p-value is calculated as the area under the appropriate null sampling distribution of F that is bigger than the observed F-statistic. We reject the null hypothesis if p ≤ α.

7.3 Do it in SPSS

To run a one-way ANOVA in SPSS, use the Analyze menu, select Compare Means, then One-Way ANOVA. Add the quantitative outcome variable to the “Dependent List”, and the categorical explanatory variable to the “Factor” box. Click OK to get the output. The dialog box for One-Way ANOVA is shown in figure 7.7.

You can also use the Options button to perform descriptive statistics by group, perform a variance homogeneity test, or make a means plot.

7.4. READING THE ANOVA TABLE 187

Figure 7.7: One-Way ANOVA dialog box.

You can use the Contrasts button to specify particular planned contrasts among the levels or you can use the Post-Hoc button to make unplanned contrasts (cor- rected for multiple comparisons), usually using the Tukey procedure for all pairs or the Dunnett procedure when comparing each level to a control level. See chapter 13 for more information.

7.4 Reading the ANOVA table

The ANOVA table is the main output of an ANOVA analysis. It always has the “source of variation” labels in the first column, plus additional columns for “sum of squares”, “degrees of freedom”, “means square”, F, and the p-value (labeled “Sig.” in SPSS).

For one-way ANOVA, there are always rows for “Between Groups” variation and “Within Groups” variation, and often a row for “Total” variation. In one-way ANOVA there is only a single F statistic (MSbetween/MSwithin), and this is shown on the “Between Groups” row. There is also only one p-value, because there is only one (overall) null hypothesis, namely H 0 : μ 1 = · · · = μk, and because the p-value comes from comparing the (single) F value to its null sampling distribution. The calculation of MS for the total row is optional.

Table 7.2 shows the results for the moral sentiment experiment. There are several important aspects to this table that you should understand. First, as discussed above, the “Between Groups” lines refer to the variation of the group means around the grand mean, and the “Within Groups” line refers to the variation

7.5. ASSUMPTION CHECKING 189

correctly). Just from this one-way ANOVA table, we can see that there were 3 treatment groups (because dfBetween is one less than the number of groups). Also, we can calculate that there were 125+1=126 subjects in the experiment.

Finally, it is worth knowing that MSwithin is an estimate of σ^2 , the variance of outcomes around their group mean. So we can take the square root of MSwithin to get an estimate of σ, the standard deviation. Then we know that the majority (about 23 ) of the measurements for each group are within σ of the group mean and most (about 95%) are within 2σ, assuming a Normal distribution. In this example the estimate of the s.d. is

9 .60 = 3.10, so individual subject cooperation values more than 2(3.10)=6.2 coins from their group means would be uncommon.

You should understand the structure of the one-way ANOVA table including that MS=SS/df for each line, SS and df are additive, F is the ratio of between to within group MS, the p-value comes from the F-statistic and its presumed (under model assumptions) null sampling distribution, and the number of treatments and number of subjects can be calculated from degrees of freedom.

7.5 Assumption checking

Except for the skewness of the shame group, the skewness and kurtosis statistics for all three groups are within 2SE of zero (see Table 7.1), and that one skewness is only slightly beyond 2SE from zero. This suggests that there is no evidence against the Normality assumption. The close similarity of the three group standard deviations suggests that the equal variance assumption is OK. And hopefully the subjects are totally unrelated, so the independent errors assumption is OK. Therefore we can accept that the F-distribution used to calculate the p-value from the F-statistic is the correct one, and we “believe” the p-value.

7.6 Conclusion about moral sentiments

With p = 0. 013 < 0 .05, we reject the null hypothesis that all three of the group population means of cooperation are equal. We therefore conclude that differences

190 CHAPTER 7. ONE-WAY ANOVA

in mean cooperation are caused by the induced emotions, and that among control, guilt, and shame, at least two of the population means differ. Again, we defer looking at which groups differ to chapter 13.

(A complete analysis would also include examination of residuals for additional evaluation of possible non-normality or unequal spread.)

The F-statistic of one-way ANOVA is easily calculated by a computer. The p-value is calculated from the F null sampling distribution with matching degrees of freedom. But only if we believe that the assump- tions of the model are (approximately) correct should we believe that the p-value was calculated from the correct sampling distribution, and it is then valid.