Within-Subjects Designs: An Overview of ANOVA for Correlated Errors, Exams of Design

The use of ANOVA in within-subjects designs, where multiple measurements are made for each subject. Within-subjects factors include treatments, measurements, or repetitions over time or space. The advantages of using within-subjects designs include increased power and the ability to remove subject-to-subject variation. However, they also come with challenges, such as correlated errors and the need for specific analysis techniques. examples and explanations of one-way repeated measures analysis and mixed within/between subjects two-way ANOVA.

Typology: Exams

2021/2022

Uploaded on 09/12/2022

paulina
paulina 🇺🇸

4.4

(13)

240 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 14
Within-Subjects Designs
ANOVA must be modified to take correlated errors into account when multiple
measurements are made for each subject.
14.1 Overview of within-subjects designs
Any categorical explanatory variable for which each subject experiences all of the
levels is called a within-subjects factor. (Or sometimes a subject may experience
several, but not all levels.) These levels could be different “treatments”, or they
may be different measurements for the same treatment (e.g., height and weight as
outcomes for each subject), or they may be repetitions of the same outcome over
time (or space) for each subject. In the broad sense, the term repeated measure
is a synonym for a within-subject factor, although often the term repeated measures
analysis is used in a narrower sense to indicate the specific set of analyses discussed
in Section 14.5.
In contrast to a within-subjects factor, any factor for which each subject ex-
periences only one of the levels is a between-subjects factor. Any experiment
that has at least one within-subjects factor is said to use a within-subjects de-
sign, while an experiment that uses only between-subjects factor(s) is called a
between-subjects design. Often the term mixed design or mixed within-
and between-subjects design is used when there is at least one within-subjects
factor and at least one between-subjects factor in the same experiment. (Be care-
ful to distinguish this from the so-called mixed models of chapter 15.) All of the
339
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Within-Subjects Designs: An Overview of ANOVA for Correlated Errors and more Exams Design in PDF only on Docsity!

Chapter 14

Within-Subjects Designs

ANOVA must be modified to take correlated errors into account when multiple measurements are made for each subject.

14.1 Overview of within-subjects designs

Any categorical explanatory variable for which each subject experiences all of the levels is called a within-subjects factor. (Or sometimes a subject may experience several, but not all levels.) These levels could be different “treatments”, or they may be different measurements for the same treatment (e.g., height and weight as outcomes for each subject), or they may be repetitions of the same outcome over time (or space) for each subject. In the broad sense, the term repeated measure is a synonym for a within-subject factor, although often the term repeated measures analysis is used in a narrower sense to indicate the specific set of analyses discussed in Section 14.5.

In contrast to a within-subjects factor, any factor for which each subject ex- periences only one of the levels is a between-subjects factor. Any experiment that has at least one within-subjects factor is said to use a within-subjects de- sign, while an experiment that uses only between-subjects factor(s) is called a between-subjects design. Often the term mixed design or mixed within- and between-subjects design is used when there is at least one within-subjects factor and at least one between-subjects factor in the same experiment. (Be care- ful to distinguish this from the so-called mixed models of chapter 15.) All of the

340 CHAPTER 14. WITHIN-SUBJECTS DESIGNS

experiments discussed in the preceding chapters are between-subjects designs.

Please do not confuse the terms between-groups and within-groups with the terms between-subjects and within-subjects. The first two terms, which we first encountered in the ANOVA chapter, are names of specific SS and MS compo- nents and are named because of how we define the deviations that are summed and squared to compute SS. In contrast, the terms between-subjects and within- subjects refer to experimental designs that either do not or do make multiple measurements on each subject.

When a within-subjects factor is used in an experiment, new methods are needed that do not make the assumption of no correlation (or, somewhat more strongly, independence) of errors for the multiple measurements made on the same subject. (See section 6.2.8 to review the independent errors assumption.)

Why would we want to make multiple measurements on the same subjects? There are two basic reasons. First, our primary interest may be to study the change of an outcome over time, e.g., a learning effect. Second, studying multiple outcomes for each subject allows each subject to be his or her own “control”, i.e., we can effectively remove subject-to-subject variation from our investigation of the relative effects of different treatments. This reduced variability directly increases power, often dramatically. We may use this increased power directly, or we may use it indirectly to allow a reduction in the number of subjects studied.

These are very important advantages to using within-subjects designs, and such designs are widely used. The major reasons for not using within-subjects designs are when it is impossible to give multiple treatments to a single subject or because of concern about confounding. An example of a case where a within-subjects design is impossible is a study of surgery vs. drug treatment for a disease; subjects generally would receive one or the other treatment, not both.

The confounding problem of within-subjects designs is an important concern. Consider the case of three kinds of hints for solving a logic problem. Let’s take the time till solution as the outcome measure. If each subject first sees problem 1 with hint 1, then problem 2 with hint 2, then problem 3 with hint 3, then we will probably have two major difficulties. First, the effects of the hints carry- over from each trial to the next. The truth is that problem 2 is solved when the subject has been exposed to two hints, and problem 3 when the subject has been exposed to all three hints. The effect of hint type (the main focus of inference) is confounded with the cumulative effects of prior hints.

342 CHAPTER 14. WITHIN-SUBJECTS DESIGNS

value on the y-axis to get a histogram that shows which values are most likely and from which we can visualize how likely a range of values is.

To represent the outcomes of two treatments for each subject, we need a so- called, bivariate distribution. To produce a graphical representation of a bivariate distribution, we use the two axes (say, y1 and y2) on a sheet of paper for the two different outcome values, and therefore each pair of outcomes corresponds to a point on the paper with y1 equal to the first outcome and y2 equal to the second outcome. Then the third dimension (coming up out of the paper) represents how likely each combination of outcome is. For a bivariate Normal distribution, this is like a real bell sitting on the paper (rather than the silhouette of a bell that we have been using so far).

Using an analogy between a bivariate distribution and a mountain peak, we can represent a bivariate distribution in 2-dimensions using a figure corresponding to a topographic map. Figure 14.1 shows the center and the contours of one particular bivariate Normal distribution. This distribution has a negative correlation between the two values for each subject, so the distribution is more like a bell squished along a diagonal line from the upper left to the lower right. If we have no correlation between the two values for each subject, we get a nice round bell. You can see that an outcome like Y 1 = 2, Y 2 = 6 is fairly likely, while one like Y 1 = 6, Y 2 = 2 is quite unlikely. (By the way, bivariate distributions can have shapes other than Normal.)

The idea of the bivariate distribution can easily be extended to more than two dimensions, but is of course much harder to visualize. A multivariate distribution with k-dimensions has a k-length vector (ordered set of numbers) representing its mean. It also has a k ×k dimensional matrix (rectangular array of numbers) repre- senting the variances of the individual variables, and all of the paired covariances (see section 3.6.1).

For example a 3-dimensional multivariate distribution representing the out- comes of three treatments in a within-subjects experiment would be characterized by a mean vector, e.g.,

μ =

  

μ 1 μ 2 μ 3

   ,

14.2. MULTIVARIATE DISTRIBUTIONS 343

Figure 14.1: Contours enclosing 1/3, 2/3 and 95% of a bivariate Normal distribu- tion with a negative covariance.

and a variance-covariance matrix, e.g.,

 

σ 12 γ 1 , 2 γ 1 , 3 γ 1 , 2 σ^22 γ 2 , 3 γ 1 , 3 γ 2 , 3 σ^23

 .

Here we are using γi,j to represent the covariance of variable Yi with Yj. Sometimes, as an alternative to a variance-covariance matrix, people use a variance vector, e.g.,

σ^2 =

  

σ 12 σ 22 σ 32

   ,

and a correlation matrix, e.g.,

Corr =

  

1 ρ 1 , 2 ρ 1 , 3 ρ 1 , 2 1 ρ 2 , 3 ρ 1 , 3 ρ 2 , 3 1

  .

14.4. PAIRED T-TEST 345

approach corresponds to results labeled “multivariate” under “repeated mea- sures ANOVA” for most statistical packages.

  1. Treat each response as a separate (univariate) observation, and treat “sub- ject” as a (random) blocking factor. This corresponds to within-subjects ANOVA with subject included as a random factor and with no interaction in the model. It also corresponds to the “univariate” output under “re- peated measures”. In this form, there are assumptions about the nature of the within-subject correlation that are not met fairly frequently. To use the univariate approach when its assumptions are not met, it is common to use some approximate correction (to the degrees of freedom) to compensate for a shifted null sampling distribution.
  2. Treat each measurement as univariate, but explicitly model the correlations. This is a more modern univariate approach called “mixed models” that sub- sumes a variety of models in a single unified approach, is very flexible in modeling correlations, and often has improved interpretability. As opposed to “classical repeated measures analysis” (approaches 2 and 3), mixed models can accommodate missing data as oppposed to dropping all data from every subject who is missing one or more measurements), and it accommodates unequal and/or irregular spacing of repeated measurements. Mixed models can also be extended to non-normal outcomes. (See chapter 15.)

14.4 Paired t-test

The paired t-test uses response simplification to handle the correlated errors. It only works with two treatments, so we will ignore the diathermy treatment in our osteoarthritis example for this section. The simplification here is to compute the difference between the two outcomes for each subject. Then there is only one “outcome” for each subject, and there is no longer any concern about correlated errors. (The subtraction is part of the paired t-test, so you don’t need to do it yourself.)

In SPSS, the paired t-test requires the “wide” form of data in the spreadsheet rather than the “tall” form we have used up until now. The tall form has one outcome per row, so it has many rows. The wide form has one subject per row with two or more outcomes per row (necessitating two or more outcome columns).

346 CHAPTER 14. WITHIN-SUBJECTS DESIGNS

The paired t-test uses a one-sample t-test on the single column of com- puted differences. Although we have not yet discussed the one-sample t-test, it is a straightforward extension of other t-tests like the independent- sample t-test of Chapter 6 or the one for regression coefficients in Chapter

  1. We have an estimate of the difference in outcome between the two treat- ments in the form of the mean of the difference column. We can compute the standard error for that difference (which is the square root of the vari- ance of the difference column divided by the number of subjects). Then we can construct the t-statistic as the estimate divided by the SE of the estimate, and under the null hypothesis that the population mean differ- ence is zero, this will follow a t-distribution with n − 1 df, where n is the number of subjects.

The results from SPSS for comparing control to TENS ROM is shown in table 14.1. The table tells us that the best point estimate of the difference in population means for ROM between control and TENS is 17.70 with control being higher (because the direction of the subtraction is listed as control minus TENS). The uncertainty in this estimate due to random sampling variation is 7.256 on the standard deviation scale. (This was calculated based on the sample size of 10 and the observed standard deviation of 22.945 for the observed sample.) We are 95% confident that the true reduction in ROM caused by TENS relative to the control is between 1.3 and 34.1, so it may be very small or rather large. The t-statistic of 2.439 will follow the t-distribution with 9 df if the null hypothesis is true and the assumptions are met. This leads to a p-value of 0.037, so we reject the null hypothesis and conclude that TENS reduces range of motion.

For comparison, the incorrect, between-subjects one-way ANOVA analysis of these data gives a p-value of 0.123, leading to the (probably) incorrect conclusion that the two treatments both have the same population mean of ROM. For future discussion we note that the within-groups SS for this incorrect analysis is 10748. with 18 df.

For educational purposes, it is worth noting that it is possible to get the same correct results in this case (or other one-factor within-subjects experiments) by performing a two-way ANOVA in which “subject” is the other factor (besides treatment). Before looking at the results we need to note several important facts.

348 CHAPTER 14. WITHIN-SUBJECTS DESIGNS

Type III Sum Source of Squares df Mean Square F Sig. Intercept Hypothesis 173166.05 1 173166.05 185.99 <0. Error 8379.45 9 931. rx Hypothesis 1566.45 1 1566.45 5.951 0. Error 2369.05 9 263. subject Hypothesis 8379.45 9 931.05 3.537 0. Error 2369.05 9 263.

Table 14.2: Two-way ANOVA results for the osteoarthritis experiment.

correspond to the different null hypotheses). The null hypothesis of main interest here is that the three treatment population means are equal, and that is tested and rejected on the line called “rx”. The null hypothesis for the random subject effect is that the population variance of the subject-to-subject means (of all three treatments) is zero.

The key observation from this table is that the treatment (rx) SS and MS corresponds to the between-groups SS and MS in the incorrect one-way ANOVA, while the sum of the subject SS and error SS is 10748.5, which is the within-groups SS for the incorrect one-way ANOVA. This is a decomposition of the four sources of error (see Section 8.5) that contribute to σ^2 , which is estimated by SSwithin in the one-way ANOVA. In this two-way ANOVA the subject-to-subject variability is estimated to be 931.05, and the remaining three sources contribute 263.23 (on the variance scale). This smaller three-source error MS is the denominator for the numerator (rx) MS for the F-statistic of the treatment effect. Therefore we get a larger F-statistic and more power when we use a within-subjects design.

How do we know which error terms to use for which F-tests? That requires more mathematical statistics than we cover in this course, but SPSS will produce an EMS table, and it is easy to use that table to figure out which ratios are 1. when the null hypotheses are true.

It is worth mentioning that in SPSS a one-way within-subjects ANOVA can be analyzed either as a two-way ANOVA with subjects as a random factor (or even as a fixed factor if a no-interaction model is selected) or as a repeated measures analysis (see next section). The p-value for the overall null hypothesis, that the population outcome means are equal for all levels of the factor, is the same for

14.5. ONE-WAY REPEATED MEASURES ANALYSIS 349

each analysis, although which auxiliary statistics are produced differs.

A two-level one-way within-subjects experiment can equivalently be analyzed by a paired t-test or a two-way ANOVA with a random sub- ject factor. The latter also applies to more than two levels. The extra power comes from mathematically removing the subject-to-subject component of the underlying variance (σ^2 ).

14.5 One-way Repeated Measures Analysis

Although repeated measures analysis is a very general term for any study in which multiple measurements are made on the same subject, there is a narrow sense of repeated measures analysis which is discussed in this section and the next section. This is a set of specific analysis methods commonly used in social sciences, but less commonly in other fields where alternatives such as mixed models tends to be used.

This narrow-sense repeated measures analysis is what you get if you choose “General Linear Model / Repeated Measures” in SPSS. It includes the second and third approaches of our list of approaches given in the introduction to this chapter. The various sections of the output are labeled univariate or multivariate to distinguish which type of analysis is shown.

This section discusses the k-level (k ≥ 2) one-way within-subjects ANOVA using repeated measures in the narrow sense. The next section discusses the mixed within/between subjects two-way ANOVA.

First we need to look at the assumptions of repeated measures analysis. One- way repeated measures analyses assume a Normal distribution of the outcome for each level of the within-subjects factor. The errors are assumed to be uncorrelated between subjects. Within a subject the multiple measurements are assumed to be correlated. For the univariate analyses, the assumption is that a technical condition called sphericity is met. Although the technical condition is difficult to understand, there is a simpler condition that is nearly equivalent: compound symmetry. Compound symmetry indicates that all of the variances are equal and all of the covariances (and correlations) are equal. This variance-covariance

14.5. ONE-WAY REPEATED MEASURES ANALYSIS 351

Type III Sum Source of Squares df Mean Square F Sig. rx Sphericity Assumed 2161.8 2 1080.9 3.967. Greenhouse-Geisser 2161.8 1.848 1169.7 3.967. Huynh-Feldt 2161.8 2.000 1080.9 3.967. Lower-bound 2161.8 1.000 1169.7 3.967. Error(rx) Sphericity Assumed 4904.2 18 272. Greenhouse-Geisser 4904.2 16.633 294. Huynh-Feldt 4904.2 18,000 272. Lower-bound 4904.2 9.000 544.

Table 14.3: Tests of Within-Subjects Effects for the osteoarthritis experiment.

the univariate analysis.

The approach for the multivariate analysis is to first construct a set of k − 1 orthogonal contrasts. (The main effect and interaction p-values are the same for every set of orthogonal contrasts.) Then SS are computed for each contrast in the usual way, and also “sum of cross-products” are also formed for pairs of contrasts. These numbers are put into a k − 1 by k − 1 matrix called the SSCP (sums of squares and cross products) matrix. In addition to the (within-subjects) treatment SSCP matrix, an error SSCP matrix is constructed analogous to computation of error SS. The ratio of these matrices is a matrix with F-values on the diagonal and ratios of treatment to error cross-products off the diagonal. We need to make a single F statistic from this matrix to get a p-value to test the overall null hypothesis. Four methods are provided for reducing the ratio matrix to a single F value. These are called Pillai’s Trace, Wilk’s Lambda, Hotelling’s Trace, and Roy’s Largest Root. There is a fairly extensive, difficult-to- understand literature comparing these methods, but it most cases they give similar p-values.

The decision to reject or retain the overall null hypothesis of equal population outcome means for all levels of the within-subjects factor is made by looking at

352 CHAPTER 14. WITHIN-SUBJECTS DESIGNS

Effect Value F Hypothesis df Error df Sig. modality Pillai’s Trace 0.549 4.878 2 8 0. Wilk’s Lambda 0.451 4.878 2 8 0. Hotelling’s Trace 1.220 4.878 2 8 0. Roy’s Largest Root 1.220 4.878 2 8 0.

Table 14.4: Multivariate Tests for the osteoarthritis experiment.

the p-value for one of the four F-values computed by SPSS. I recommend that you use “Pillai’s trace”. The thing you should not do is pick the line that gives the answer you want! In a one-way within-subjects ANOVA, the four F-values will always agree, while in more complex designs they will disagree to some extent.

Which approach should we use, univariate or multivariate? Luckily, they agree most of the time. When they disagree, it could be because the univariate approach is somewhat more powerful, particularly for small studies, and is thus preferred. Or it could be that the correction is insufficient in the case of far deviation from sphericity, in which case the multivariate test is preferred as more robust. In general, you should at least look for outliers or mistakes if there is a disagreement.

An additional section of the repeated measures analysis shows the planned contrasts and is labeled “Tests of Within-Subjects Contrasts”. This section is the same for both the univariate and multivariate approaches. It gives a p-value for each planned contrast. The default contrast set is “polynomial” which is generally only appropriate for a moderately large number of levels of a factor representing repeated measures of the same measurement over time. In most circumstances, you will want to change the contrast type to simple (baseline against each other level) or repeated (comparing adjacent levels).

It is worth noting that post-hoc comparisons are available for the within- subjects factor under Options by selecting the factor in the Estimated Marginal Means box and then by checking the “compare main effects” box and choosing Bonferroni as the method.

354 CHAPTER 14. WITHIN-SUBJECTS DESIGNS

Repeated measures analysis is appropriate when one (or more) fac- tors is a within-subjects factor. Usually univariate and multivariate tests agree for the overall null hypothesis for the within-subjects fac- tor or any interaction involving a within-subjects factor. Planned (main effects) contrasts are appropriate for both factors if there is no significant interaction. Post-hoc comparisons can also be performed.

14.6.1 Repeated Measures in SPSS

To perform a repeated measures analysis in SPSS, use the menu item “Analyze / General Linear Model / Repeated Measures.” The example uses the data in circleWide.sav. This is in the “wide” format with a separate column for each level of the repeated factor.

Figure 14.2: SPSS Repeated Measures Define Factor(s) dialog box.

Unlike other analyses in SPSS, there is a dialog box that you must fill out before seeing the main analysis dialog box. This is called the “Repeated Measures Define Factor(s)” dialog box as shown in Figure 14.2. Under “Within-Subject Factor

14.6. MIXED BETWEEN/WITHIN-SUBJECTS DESIGNS 355

Name” you should enter a (new) name that describes what is different among the levels of your within-subjects factor. Then enter the “Number of Levels”, and click Add. In a more complex design you need to do this for each within-subject factor. Then, although not required, it is a very good idea to enter a “Measure Name”, which should describe what is measured at each level of the within-subject factor. Either a term like “time” or units like “milliseconds” is appropriate for this box. Click the “Define” button to continue.

Figure 14.3: SPSS Repeated Measures dialog box.

Next you will see the Repeated Measures dialog box. On the left is a list of all variables, at top right right is the “Within-Subjects Variables” box with lines for each of the levels of the within-subjects variables you defined previously. You should move the k outcome variables corresponding to the k levels of the within- subjects factor into the “Within-Subjects Variables” box, either one at a time or all together. The result looks something like Figure 14.3. Now enter the between- subjects factor, if any. Then use the model button to remove the interaction if