






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
To emphasize the current interest in experimental designs, Evidence-Based Practices (EBP) in medicine, psychology and education are highlighted. Finally, ...
Typology: Summaries
1 / 30
This page cannot be seen from the preview
Don't miss anything!























Running Head: EXPERIMENTAL DESIGN
Experimental Design and Some Threats to Experimental Validity: A Primer
Susan Skidmore Texas A&M University
Paper presented at the annual meeting of the Southwest Educational Research Association, New Orleans, Louisiana, February 6, 2008.
Abstract Experimental designs are distinguished as the best method to respond to questions involving causality. The purpose of the present paper is to explicate the logic of experimental design and why it is so vital to questions that demand causal conclusions. In addition, types of internal and external validity threats are discussed. To emphasize the current interest in experimental designs, Evidence- Based Practices (EBP) in medicine, psychology and education are highlighted. Finally, cautionary statements regarding experimental designs are elucidated with examples from the literature.
dependent variable (DV). Ideally, all other variables are eliminated, controlled or distributed in such a way that a conclusion that the IV caused the DV is validly justified.
Figure 1. Diagram of an experiment.
In Figure 1 above you can see that there are two groups. One group receives some sort of manipulation that is thought (theoretically or from previous research) to have an impact on the DV. This is known as the experimental group because participants in this group receive some type of treatment that is presumed to impact the DV. The other group, which does not receive a treatment or instead receives some type of alternative treatment, provides the result of what would have happened without experimental intervention (manipulation of the IV). So how do you determine whether participants will be in the control group or the experimental group? The answer to this question is one of the characteristics that underlie the strength of true experimental designs. True experiments must have three essential characteristics: random assignment to
Outcome measured as DV
No manipulation or alternate manipulation of IV (treatment or intervention) Control Group
Manipulation of IV (treatment or intervention) Experimental Group
groups, an intervention given to at least one group and an alternate or no intervention for at least one other group, and a comparison of group performances on some post-intervention measurement (Gall, Gall, & Borg, 2005). Participants in a true experimental design are randomly allocated to either the control group or the experimental group. A caution is necessary here. Random assignment is not equivalent to random sampling. Random sampling determines who will be in the study, while random assignment determines in which groups participants will be. Random assignment makes “samples randomly similar to each other , whereas random sampling makes a sample similar to a population ” (Shadish, Cook, & Campbell, 2002, p. 248, emphasis in original). Nonetheless, random assignment is extremely important. By randomly assigning participants (or groups of participants) to either the experimental or control group, each participant (or groups of participants) is as likely to be assigned to one group as to the other (Gall et al., 2005). In other words, by giving each participant an equal probability of being a member of each group, random assignment equates the groups on all other factors, except for the intervention that is being implemented, thereby ensuring that the experiment will produce “unbiased estimates of the average treatment effect” (Rosenbaum, 1995, p. 37). To be clear, the term “unbiased estimates” describes the fact that any observed effect differences between the study results and the “true” population are due to chance (Shadish et al., 2002).
Recognizing that most people are outliers on at least some variables (Thompson, 2006), there may be some observed differences that are due simply to the variable characteristics of the members of the treatment groups. For example, let‟s say that six of the ten graduate students are chronic procrastinators, and might benefit greatly from regular scheduled visits with a graduate advisor, while four of the ten graduate students are intrinsically motivated and tend to experience increased anxiety with frequent graduate advisor inquiries. If the random assignment process distributes these six procrastinator graduate students equally among the two groups, a bias due to this characteristic will not evidence itself in the results. If instead, due to chance all four intrinsically motivated students end up in the experimental group, the results of the study may not be the same had the groups been more evenly distributed. Ridiculously small sample sizes, therefore would result in more pronounced differences between the groups that are not due to treatment effects, but instead are due to the variable characteristics of the members in the groups. If instead I have a sample of 10,000 graduate students that that I am going to randomly assign to one of two treatment groups, the law of large numbers works for me. As explained by Thompson et al. (2005), “The beauty of true experiments is that the law of large numbers creates preintervention group equivalencies on all variables, even variables that we do not realize are essential to control” (p. 183). While there is still not identical membership across treatment groups, and I still expect that the observed differences between the control group and the experimental group are going to be due to any possible treatment effects
and to the error associated with the random assignment process, the expectation of equality of groups is nevertheless reasonably approximated. In other words, I expect the ratio of procrastinators to intrinsically motivated students to be approximately the same across the two treatment groups. In fact, I expect proportions of variables I am not even aware of to be the same, on average, across treatment groups! The larger sample size has greatly decreased the error due to chance associated with the random assignment process. As you can see in Figure 2, even if both of the sample studies produce identical treatment effects, the results are not equally valid. The majority of the effect observed in the small sample size study is actually due to error associated with the random assignment process and not a result of the treatment. This effect due to error is greatly reduced in the large sample size study.
Figure 2. Observed treatment effects in two studies with different sample sizes. The white area represents the amount of the observed effect due to the error associated with the random assignment process. The grey area represents the “true” treatment effect. Three Experimental Designs When well-conducted, a randomized experiment is considered the “gold standard” in causal research (Campbell, 1957; Campbell & Stanley, 1963;
“True” treatment effects n=
error
“True” treatment effects n=10,
e r r o r
equality of groups on the variable of interest prior to the intervention. These designs are “read” left to right to correspond to the passage of time (i.e., what happens first, second). Experimental Group R O X O Control Group R O O
The second true experiment is the Posttest-Only Control Group Design. This design varies from the first in that it controls for possible confounding effects of a pretest because it does not use a pre-intervention measurement. All three characteristics of a true experimental design are present as in the previous design: random assignment, intervention implemented with experimental group only, and post-intervention measurement. Experimental Group R X O Control Group R O The third and final design is the Solomon Four-Group Design. This design is the strongest of the three. It not only corrects for the possible confounding effects of a pretest, but allows you to compare these results, to an experimental and control group that did receive a pretest. The major drawback to this design compared to the others is the obvious increase in sample size needed to meet the needs of four treatment groups as opposed to two treatment groups. Experimental Group (with pre-test) R O X O Control Group (with pre-test) R O O Experimental Group (without pre-test) R X O Control Group (without pre-test) R O
In addition to detailing these designs in their seminal work, Campbell and Stanley (1963) firmly established their explicit commitment to experiments “as
the only means for settling disputes regarding educational practice, as the only way of verifying educational improvements, and as the only way of establishing a cumulative tradition in which improvements can be introduced without the danger of a faddish discard of old wisdom in favor of inferior novelties”(Campbell & Stanley, 1963, p. 172). Validity Threats Even when these designs are used, there are differences in how rigidly they are followed as well as to what extent the researcher addresses the multiple threats to validity (see Figure 3 below). Threats to validity are important not only to research designer but also to consumers of research. An informed consumer of research wants to rule out all competing hypothesis and be firmly convinced that the evidence supports the claim that the IV caused the DV. To merit this conclusion, an evaluation of the study is necessary to determine whether threats to experimental validity were recognized and mitigated.
Figure 3. Example of a research experiment and the questions you should ask yourself about internal and external validity. Adapted from (Sani & Todman, 2006).
hypothesized effects
internal validity Are we really observing these effects or the effects of other variables on the DV (procrastination vs. intrinsically motivated)? external validity Are these effects to be found in other contexts and people, or are they specific to our experimental setting and participants?
Independent Variable Graduate Advisor Meetings
Dependent Variable Procrastination/ Motivation scale
motivation/ procrastination scale than they would have had they gotten enough sleep.
Were those students that left somehow different from the ones that remained? If so, would that difference have produced differential results than the ones we observed with the remaining participants?
the same extent) if other outcome measurements were used. For example, if a person scores highest on a test for physical strength they may not necessarily score highest on a flexibility test.
EBP in Medicine, Psychology and Education While the origins of EBP may date back to the origin of scientific reasoning, the Evidence-Based Medicine Working Group (EBMWG) brought the discussion of EBP to the forefront of medicine (1992). In 1996, Evidence-Based Medicine (EBM) was defined as “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research” (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996, p. 71). While EBP has many supporters in medicine, EBP has caused some concerns among practitioners. Researchers have addressed concerns regarding the perception of EBM as a top down approach that results in ivory tower researchers dictating how practitioners should practice (Sackett et al., 1996) or similarly that evidence from randomized controlled trials may be valued more highly than practitioner expertise (Kübler, 2000). Yet, it is difficult to deny that there is great support for EBP considering the number of periodicals that have emerged since the years after EBMWG convened. A keyword search for “evidence-based” returns 100 serials on WorldCAT. A keyword search for “evidence-based” returns 96 serials in Ulrich’s Periodical Directory. At least 32 active periodicals, either in print form, electronic form, or both contain “evidence-based” within the title of the periodical. At least 26 of these periodicals are available electronically. See Table 1.
Table 1 (continued).
Start Year Title of Periodical
2000 Evidence-Based Gastroenterology 2000 Evidence-Based Oncology
2000 Trauma Reports: Evidence-Based Medicine for the ED 2001 Journal of Evidence-Based Dental Practice 2003 Evidence-Based Integrative Medicine 2003 Evidence-Based Midwifery 2003 Evidence-Based Preventive^ Medicine 2003 Evidence-Based Surgery 2003 (2005) International Journal of Evidence-Based Healthcare 2004 Evidence-Based Complementary and Alternative Medicine: eCAM 2004 Journal of Evidence-Based Social Work 2004 Worldviews on Evidence-Based Nursing 2005 Advances in Psychotherapy: Evidence-Based Practice 2005 Evidence-Based Ophthalmology 2005 Journal of Evidence-Based Practices for Schools 2006 Evidence-Based Child Health 2006 Evidence-Based Library and Information Practice 2007 Evidence-Based^ Communication Assessment and Intervention
Periodicals available electronically are shown in bold. Parenthetical dates indicate different start year date in WorldCAT.
The popularity of EBP is evident in psychology as well. The American Psychological Association‟s Presidential Task Force on Evidence-Based Practice specifically defined Evidence-Based Practice in Psychology (EBPP) as “the integration of the best available research with clinical expertise in the context of patient characteristics, culture, and preferences” (2006, p. 273). In addition to advocating evidence-based practices, this task force also established the two necessary components for evaluation of psychological interventions: treatment efficacy and clinical utility. Treatment efficacy specifically addresses questions such as how well a particular treatment works. This type of question lends itself to experimental investigation to draw valid causal conclusions about the effect of a particular intervention (or lack thereof) on a particular disorder (American Psychological Association, 2002). Chambless and Hollon (1998), in their review of psychological treatment literature, provide a description of variables of interest when evaluating treatment efficacy in research studies. The Task Force acknowledged that while there are other methods that may lead to causal conclusions “randomized controlled experiments represent a more stringent way to evaluate treatment efficacy because they are the most effective way to rule out threats to internal validity in a single experiment” (American Psychological Association, 2002, p. 1054). The appeals for evidence continue also in the field of education. Grover J. (Russ) Whitehurst, who directs the Education Department's Institute of Education Sciences, defined Evidence-Based Education (EBE) as “the integration of professional wisdom with the best available empirical evidence in making