



























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
experimental design have begun to emerge. There are many excellent textbooks on experimental design (for example, Cox 1958, Mead (1988), Hinkelmann and.
Typology: Lecture notes
1 / 35
This page cannot be seen from the preview
Don't miss anything!




























(Version 4, 14 March 2013) Page
10.1 GENERAL PRINCIPLES OF EXPERIMENTAL DESIGN .............................. 424 10.1.1 Randomization ................................................................................... 430 10.1.2 Replication and Pseudoreplication................................................. 431 10.1.3 Design Control .................................................................................. 437 10.2 TYPES OF EXPERIMENTAL DESIGNS ....................................................... 437 10.2.1 Linear Additive Models ..................................................................... 438 10.2.2 Factorial Designs ............................................................................. 442 10.2.3 Randomized Block Design .............................................................. 447 10.2.4 Split-Unit Designs............................................................................. 450 10.2.5 Latin Square Designs ....................................................................... 452 10.2.6 Repeated Measure Designs ............................................................. 453 10.3 WHERE SHOULD I GO NEXT? .................................................................... 454 10.4 SUMMARY .................................................................................................... 455 SELECTED REFERENCES ................................................................................... 456 QUESTIONS AND PROBLEMS ............................................................................. 456
Measurements in ecology must be not only done accurately and precisely but also carried out within the general framework of a good experimental design. As ecologists have attempted to do more field experiments, the difficulties and pitfalls of experimental design have begun to emerge. There are many excellent textbooks on experimental design (for example, Cox 1958, Mead (1988), Hinkelmann and Kempthorne 2008) and I shall not attempt to summarize the detailed discussion you may obtain by reading one of these statistical texts. Rather I shall concentrate on the simple principles of experimental design as they are applied to ecological studies. The aim of this chapter is to teach you about the general principles of ecological experimental design so that you can talk intelligently to a statistician or take up one of the statistical texts on the subject without a grimace.
Experimental design is a term describing the logical structure of an experiment. Let us begin by defining some of the terms that are commonly used in discussions of experimental design. The important point is that if you are talking to a statistician, you should carefully adopt their language or confusion may well result. There are three aspects of the structure of an experimental design.
1. Treatment structure : This defines the set of treatments selected for comparison. Statisticians use the term treatment as a general term for any set of comparisons. For example, if you wished to compare the body size of a lizard species on two islands, the two islands would be called ‘treatments’. The term is somewhat stilted for many ecological studies but it fits quite well for studies in which a manipulation is applied to a set of plots or chambers, e.g. increased CO 2 , normal CO 2 , reduced CO (^2) affecting grass growth. Treatment factors can be qualitative or quantitative. The investigator selects the treatment structure to reflect the ecological hypotheses under investigation. 2. Design structure : This specifies the rules by which the treatments are to be allocated to the experimental units. This is the central issue of experimental design and covers much of the material this chapter deals with, and is discussed much more fully in Mead (1988). The important scientific point is that the design structure depends on the scientific questions being asked, and most importantly it dictates what type of statistical analysis can be carried out on the resulting data. 3. Response structure : This specifies the measurements to be made on each experimental unit. In particular it gives the list of response variables that are measured and when and where the measurements are to be made. These are the key decisions an ecologist must make to gather the data needed to answer the question under investigation. Even though the response structure is a critical element in experimental design, it is rarely discussed. Most fields of ecology have developed paradigms of response structure, but these must be explicitly recognized. For example, if you wish to measure body size of lizards only once each year, you could not investigate important seasonal variations in growth. Again the key is to have the design coordinated with the ecological questions.
experimental units (the two 10-ha blocks), and the 50 small plots are subsamples of each experimental unit. Ecologists at times talk of the 50 small plots as the ‘unit of study’ but the statistician more precisely uses the term experimental unit for the 10 ha blocks.
2. In a plant growth study, four fertilizer treatments (none, N, N+P, N+P+K) may be applied at random to 50 one-square-meter plots on each of two areas. In this case there are 50 experimental units on each area because any single one-square-meter plot might be treated with any one of the four fertilizers. Note that the experimental units must be separated by enough space to be independent of one another, so that for example the spraying of fertilizer on one plot does not blow or leak over to adjacent plots. 3. In a study of tree growth up an altitudinal gradient, a plant ecologist wishes to determine if growth rates decrease with altitude. The unit of study or experimental unit is a single tree, and a random sample of 100 trees is selected and the growth rate and altitude of each tree is recorded. The ecological condition or variable of interest is the altitude of each tree.
So the first step in specifying your experimental design is to determine the experimental units. When a statistician asks you about the number of replicates , he or she usually wants to know how many experimental units you have in each treatment. Most of the difficulty which Hurlbert (1984) has described as pseudoreplication^1 arises from a failure to define exactly what the experimental unit is.
There is a critical distinction between the experimental units and the measurements or samples that may be taken from or made on the experimental units. Statisticians use the term sampling unit for these measurements or samples. In the first example above, the 50 subsamples taken on each experimental unit are the sampling units. Hurlbert (1990a) prefers the term evaluation unit for the older term sampling unit, so note that evaluation unit = sampling unit for statistical discussions. The precise definition of evaluation unit is that element of an experimental unit on which an individual measurement is made. In the second example above, each of the 50 individual plots treated with fertilizer is an evaluation unit or sampling unit as well as an experimental unit. In the first example above each of the 50 individual plots is an evaluation unit but it is not an experimental unit.
(^1) Pseudoreplication occurs when experimental measurements are not independent. If you weigh the same fish twice you do not have two replicates
A general rule of manipulative experimentation is that one should have a "control". A control is usually defined as an experimental unit which has been given no treatment. Thus a control is usually the baseline against which the other treatments are to be compared (Fig. 10.1). In some cases the control unit is subjected to a sham treatment (e.g. spraying with water vs. spraying with a water + fertilizer solution). The term ‘control’ is awkward for ecologists because it has the implication that the ecologist controls something like the temperature or the salinity in a study. Perhaps the term ‘control’ must be maintained in order to talk to statisticians but ecologists should use the more general term ‘baseline ‘ for the baseline treatment in manipulative experiments or the baseline comparison in observational studies.
Mensurative experiments need a statistical ‘control’ in the sense of a set of experimental units that serve as a baseline for comparison. The exact nature of the controls will depend on the hypothesis being tested. For example, if you wish to measure the impact of competition from species A on plant growth, you can measure plants growing in natural stands in a mixture with species A and compare these with plants growing in the absence of species A (the baseline comparison plants). The baseline for comparison should be determined by the ecological questions being asked.
There is one fundamental requirement of all scientific experimentation: Every manipulative experiment must have a control.
If a control is not present, it is impossible to conclude anything definite about the experiment^2..^ In ecological field experiments there is so much year to year variation in communities and ecosystems that an even stronger rule should be adopted: Every manipulative ecological field experiment must have a contemporaneous control.
Because of the need for replication this rule dictates that field experiments should utilize at least 2 controls and 2 experimental areas or units. Clearly statistical power
Figure 10.1 Example of the requirements for a control in ecological studies. A stream is to be subjected to nutrient additions from a mining operation. By sampling both the control and the impact sites before and after the nutrient additions, both temporal and spatial controls are utilized. Green (1979) calls this the BACI design (Before-After, Control-Impact) and suggests that it is an optimal impact design. In other situations one cannot sample before the treatment or impact is applied, and only the spatial control of the lower diagram is present. (Modified from Green 1979)
There are at least six sources of variability that can cloud the interpretation of experiments (Table 10.1). These sources of confusion can be reduced by three statistical procedures - randomization , replication , and design control.
Source of confusion Features of an experimental design that reduce or eliminate confusion
Replication of treatments Interspersion of treatments Concomitant observations
10.1.1 Randomization
Most statistical tests make the assumption that the observations are independent. As in most statistical assumptions, independence of observations is an ideal that can never be achieved. One way to help achieve the goal of independent observations is to randomize by taking a random sample from the population or by assigning treatments at random to the experimental units. If observations are not independent, we cannot utilize any of the statistical analyses that assume independence.
Randomization is also a device for reducing bias that can invade an experiment inadvertently. Randomization thus increases the accuracy of our estimates of treatment effects.
In many ecological situations complete randomization is not possible. Study sites cannot be selected at random if only because not all land areas are available for ecological research. Within areas that are available, vehicle access will often dictate the location of study sites. The rule of thumb to use is simple: Randomize whenever possible. Systematic sampling is normally the alternative to random sampling (see Chapter 8, page 000). While most statisticians do not approve of systematic sampling, most ecologists working with field plots use some form of systematic layout of plots. Systematic sampling achieves coverage of the entire study area, which ecologists often desire. There is so far no good evidence that systematic sampling in complex natural ecosystems leads to biased estimates or unreliable comparisons. But there is always a residue of doubt when systematic sampling is used, and hence the admonition to random sample when possible. A good compromise is to semi-systematically sample. Randomization is a kind of statistical insurance.
Randomization should always be used in manipulative experiments when assigning treatments to experimental units. If some subjective procedure is used to assign treatments, the essential touchstone of statistical analysis is lost and
heterogeneous space. Figure 10.2 illustrates good and poor designs for interspersion of a simple 2-treatment field experiment. Let us look at each design briefly.
1. Completely Randomized Design : This is the simplest design recommended by many statistical tests (Fig. 10.2, A-1). Hurlbert (1984) pointed out that strict randomization can result in treatments being spatially segregated by chance, especially if only a few treatment replicates are possible. Spatial segregation will produce spurious treatment effects when there are preexisting gradients in the study area. For this reason Hurlbert (1984) recommends against this statistical design in ecological studies when treatment replicates are few, even though technically this is a perfectly acceptable statistical design to all professional statisticians. 2. Randomized Block Design : In this design the experimental units are grouped together in blocks. In ecological use the blocks may be areas of habitat, or time periods, or rooms within a greenhouse. The main point is that the blocks are relatively uniform internally, and the differences between blocks may be large or small. This is an excellent design for most field experiments because it automatically produces an interspersion of treatments (c.f. Fig. 10.2, A-2) and
Chamber 1 Chamber 2
Design Type Schema
A-1 Completely randomized A-2 Randomized block A-3 Systematic B-1 Simple segregation B-2 Clumped segregation B-3 Isolative segregation
B-4 Randomized but with interdependent replicates B-5 No replication
Figure 10.2 Schematic representation of various acceptable modes (A) of interspersing the replicates of two treatments (shaded red, unshaded) and various ways (B) in which the principle of interspersion can be violated. (From Hurlbert, 1984.)
thus reduces the effect of chance events on the results of the experiment. One additional advantage of the randomized block design is that whole blocks may be lost without compromising the experiment. If a bulldozer destroys one set of plots, all is not lost.
3. Systematic Design : This design achieves maximum interspersion of treatments at the statistical risk of errors arising from a periodic environment. Since spatially periodic environments are almost unknown in natural ecosystems, this problem is non-existent for most ecological work. Temporal periodicities are however quite common and when the treatments being applied have a time component one must be more careful to avoid systematic designs.
Figure 10.3 Three experimental layouts that show partial but inadequate interspersion of treatments. (a) An experiment to compare predation rates on male vs. female floral parts (Cox 1982). (b) Impact of removals on dispersal rates of two rodent species into field plots (Joule and Cameron (1975). (c ) Effects on algal growth of grazer removals in rocky intertidal areas (Slocum 1980). Shaded areas of the diagrams represent unused areas. In all these cases Hurlbert (1984) recommends using subjective assignments of treatments to even out the interspersion. (After Hurlbert 1984)
Another type of segregated experimental design is less easily seen - one in which the replicates are physically interdependent (Fig. 10.2, B-4). In this case the replicates may share a common heating duct, or a common filtration or water system, even though they are spatially separated. It is essential that all replicates of all treatments are hooked up together to the same heating, water, etc. systems, or else that each replicate has its own system.
Randomization and interspersion often conflict in field experiments. Some statistically valid designs will produce on occasion very spatially segregated treatment plots. From an ecological view the best approach is to reject these segregated layouts and go back and re-randomize until you get a layout with an acceptable amount of interspersion (Cox 1958 pp. 86-87). Segregated layouts are not usually a problem when there are many treatment replications.
Time
(A) Simple pseudoreplication
(B) Sacrificial pseudoreplication
(C) Temporal pseudoreplication
x (^1)
x (^1)
x (^2)
x (^2)
y (^1)
y (^1)
y (^2)
y (^2)
x (^3)
x (^3)
x (^4)
x (^4)
y (^3)
y (^3)
y (^4)
y (^4)
x (^1)
x (^2) x 3 x^4
y 1 y^2 y^3
y (^4)
Figure 10.4 The three most common types of pseudoreplication. Yellow and white boxes represent experimental units receiving two different treatments. Each dot represents an evaluation unit or sample measurement. Pseudoreplication is a consequence, in each example, of statistically testing for a treatment effect by means of procedures (e.g., t -test, U -test) which assume, implicitly, that the four data for each treatment have come from four independent experimental units (=treatment replicates). (From Hurlbert, 1984.)
Hurlbert (1984) introduced the useful idea of pseudoreplication to describe a statistical error of using treatment replicates from experiments which violate the principle of interspersion (Fig. 10.2, B-1 to B-5). The basic statistical problem is that in these cases "replicates" are not independent, and the first assumption of statistical inference is violated. Hurlbert (1984) reported that in two separate surveys 26% and 48% of the ecological papers surveyed showed the statistical error of pseudoreplication. Underwood (1981) found statistical errors in 78% of the papers he surveyed in marine ecology. Clearly there is a need to improve the statistical design of ecological studies.
Three types of pseudoreplication can be recognized (Fig. 10.4). The simplest and most common type of pseudoreplication occurs when there is only one replicate per treatment. For example, there may be one large burned area and one unburned area. If several 1 m² plots are measured within each area, these 1 m² plots are not replicates (they are subsamples) and they should not be used in a t -test to compare burned vs. unburned areas in general. A t -test would only answer the specific question of whether this particular burned area differed from this particular unburned area and we would not know if we were measuring an area effect or a burn effect. Sacrificial pseudoreplication occurs when there is a proper, replicated experimental design but the data for the replicates are pooled together prior to measurement or prior to statistical analysis. This is a simple statistical error and should be a recoverable problem in data analysis unless the samples were physically pooled before measurement^3. Temporal pseudoreplication is also common in ecological experiments in which a time series of data are accumulated. For example, with
(^3) In some cases, for example plant clip plots, all the plants from the two plots 4might be put together in a single sample bag before taking them back to the laboratory for sorting.
Before we discuss experimental designs, we must define fixed and random classifications. The decision about whether a treatment^4 in ANOVA is fixed or random is crucial for all hypothesis testing (see Mead 1988).
Fixed Factors : 1. All levels of the classification are in the experiment, or
2. The only levels of interest to the experimenter are in the experiment, or 3. The levels in the experiment were deliberately and not randomly chosen
Random Factors : 1. All levels in the experiment are a random sample from all possible levels
Thus sex is a fixed factor because both sexes are studied, and temperature (10 o, 16 0 , 27 0 ) could be a fixed factor if these are the only temperatures the experimenter is interested about or a random factor if these are a random sample of all possible temperature levels. It is important as a first step in experimental design to decide whether the factors you wish to study will be fixed or random factors since the details of statistical tests differ between fixed and random factor designs.
10.2.1 Linear Additive Models
All of the complex designs used in the analysis of variance can be described very simply by the use of linear additive models. The basic assumption underlying all of these models is additivity. The measurement obtained when a particular treatment is applied to one of the experimental units is assumed to be:
A quantity depending A quantity depending only on the particular + on the treatment experimental unit applied
The essential feature is that the treatment effect adds on to the unit term, rather than multiplying. Figure 10.5 illustrates this idea graphically. A second critical assumption is that the treatment effects are constant for all experimental units.
(^4) Treatments are called factors in most statistics discussions of ANOVA, and each factor has several levels. For example, sex can be a factor with 2 levels, males and
Finally, you must assume that the experimental units operate independently so that treatment effects do not spill over from one unit to another.
These are the essential features of linear additive models which form the core of modern parametric statistics. Consider one simple example of a linear additive model. The density of oak seedlings was measured on a series of 6 burned and 6 unburned plots. The linear additive model is:
Leaf area
4
6
8
10
12
14
16
18
Control No CO 2 added
CO 2 enriched
increased 3 units
increased 3 units
Leaf area
4
6
8
10
12
14
16
18
Control No CO 2 added
CO 2 enriched
increased 60% (3 units)
increased 60% (6 units)
(a)
(b)
females.
Linear additive models are often written as deviations:
Yij − μ = Ti + eij
Interest usually centers on the treatment effects which can be estimated from the observed means:
Effect of burning (^) = Average density Average density on density in burned plots − in all plots
Note that the effects of burning in this case are being related to a hypothetical world which is half burnt and half unburnt. From the data in Table 10.2:
Effect of burning on density ( 1 ) = 2.0 5. = 3.
Thus burning reduces density by 3.0 trees/m 2. The effect of not burning is similarly:
Effect of not (^) = Average density on (^) - Average density burning ( ) unburned plots on all plots = 8.0 5. = 3.
Note that for designs like this with two levels of treatment, the measured effects are identical in absolute value but opposite in sign.
Note that treatment effects are always relative and we need to estimate effects of one treatment in comparison with the other so that we can determine the ecological significance of the treatment. For example, in this burning experiment an ecologist wants to know the difference between burnt and unburnt:
Difference between two treatments =^ T^ − T
You can also use these treatment effects to decompose the data from each individual quadrat. For example, for plot # 5:
Since you know there were 11 trees/m 2 in this plot, and the overall density μ is 5. (the grand mean), and the effect of not burning ( T 2 ) estimated above is +3.0, the experimental error term must be 3.0 to balance the equation. Note again that the "error" term measures inherent biological variability among plots, and not "error" in the sense of "mistake".
Linear additive models may be made as complex as you wish, subject to the constraint of still being linear and additive. Many statistical computer packages will compute an ANOVA for any linear additive model you can specify, assuming you have adequate replication. Linear additive models are a convenient shorthand for describing many experimental designs.
10.2.2 Factorial Designs
When only one factor is of interest, the resulting statistical analysis is simple. But typically ecologists need to worry about several factors at the same time. For example, plankton samples may be collected in several lakes at different months of the year. Or rates of egg deposition may be measured at three levels of salinity and two temperatures. Two new concepts arise when one has to deal with several factors - factorials and interaction.
The concept of factorials is just the commonsense notion that all treatments of one factor should be tried with all treatments of the other factors. Thus if egg laying rates for a particular species of amphipod are measured in a laboratory study at 3 salinities and 2 temperatures, the investigator should do all 3 salinities at each of the two temperatures. The test for a factorial arrangement of treatments is simple - make a table! Salinity None Low Medium High
Temperature
Low High
Statisticians refer to these eight boxes as cells in an ANOVA. You must have observations in each box or cell of this table to have a factorial design. In the best of all worlds, you will have equal sample sizes in each box (or cell ) and thus a