Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Experimental Design 1, Summaries of Experimental Design

Gonzaga University Experimental Design

To emphasize the current interest in experimental designs, Evidence-Based Practices (EBP) in medicine, psychology and education are highlighted. Finally, ...

Typology: Summaries

2022/2023

Uploaded on 02/28/2023

christina 🇺🇸

4.6

(23)

393 documents

1 / 30

This page cannot be seen from the preview

Don't miss anything!

Experimental Design 1

Running Head: EXPERIMENTAL DESIGN

Experimental Design and Some Threats to

Experimental Validity: A Primer

Susan Skidmore

Texas A&M University

Paper presented at the annual meeting of the Southwest Educational

Research Association, New Orleans, Louisiana, February 6, 2008.

Discover Summaries of Experimental Design Gonzaga University

Partial preview of the text

Download Experimental Design 1 and more Summaries Experimental Design in PDF only on Docsity!

Running Head: EXPERIMENTAL DESIGN

Experimental Design and Some Threats to Experimental Validity: A Primer

Susan Skidmore Texas A&M University

Paper presented at the annual meeting of the Southwest Educational Research Association, New Orleans, Louisiana, February 6, 2008.

Abstract Experimental designs are distinguished as the best method to respond to questions involving causality. The purpose of the present paper is to explicate the logic of experimental design and why it is so vital to questions that demand causal conclusions. In addition, types of internal and external validity threats are discussed. To emphasize the current interest in experimental designs, Evidence- Based Practices (EBP) in medicine, psychology and education are highlighted. Finally, cautionary statements regarding experimental designs are elucidated with examples from the literature.

dependent variable (DV). Ideally, all other variables are eliminated, controlled or distributed in such a way that a conclusion that the IV caused the DV is validly justified.

Figure 1. Diagram of an experiment.

In Figure 1 above you can see that there are two groups. One group receives some sort of manipulation that is thought (theoretically or from previous research) to have an impact on the DV. This is known as the experimental group because participants in this group receive some type of treatment that is presumed to impact the DV. The other group, which does not receive a treatment or instead receives some type of alternative treatment, provides the result of what would have happened without experimental intervention (manipulation of the IV). So how do you determine whether participants will be in the control group or the experimental group? The answer to this question is one of the characteristics that underlie the strength of true experimental designs. True experiments must have three essential characteristics: random assignment to

Outcome measured as DV

No manipulation or alternate manipulation of IV (treatment or intervention) Control Group

Manipulation of IV (treatment or intervention) Experimental Group

groups, an intervention given to at least one group and an alternate or no intervention for at least one other group, and a comparison of group performances on some post-intervention measurement (Gall, Gall, & Borg, 2005). Participants in a true experimental design are randomly allocated to either the control group or the experimental group. A caution is necessary here. Random assignment is not equivalent to random sampling. Random sampling determines who will be in the study, while random assignment determines in which groups participants will be. Random assignment makes “samples randomly similar to each other , whereas random sampling makes a sample similar to a population ” (Shadish, Cook, & Campbell, 2002, p. 248, emphasis in original). Nonetheless, random assignment is extremely important. By randomly assigning participants (or groups of participants) to either the experimental or control group, each participant (or groups of participants) is as likely to be assigned to one group as to the other (Gall et al., 2005). In other words, by giving each participant an equal probability of being a member of each group, random assignment equates the groups on all other factors, except for the intervention that is being implemented, thereby ensuring that the experiment will produce “unbiased estimates of the average treatment effect” (Rosenbaum, 1995, p. 37). To be clear, the term “unbiased estimates” describes the fact that any observed effect differences between the study results and the “true” population are due to chance (Shadish et al., 2002).

Recognizing that most people are outliers on at least some variables (Thompson, 2006), there may be some observed differences that are due simply to the variable characteristics of the members of the treatment groups. For example, let‟s say that six of the ten graduate students are chronic procrastinators, and might benefit greatly from regular scheduled visits with a graduate advisor, while four of the ten graduate students are intrinsically motivated and tend to experience increased anxiety with frequent graduate advisor inquiries. If the random assignment process distributes these six procrastinator graduate students equally among the two groups, a bias due to this characteristic will not evidence itself in the results. If instead, due to chance all four intrinsically motivated students end up in the experimental group, the results of the study may not be the same had the groups been more evenly distributed. Ridiculously small sample sizes, therefore would result in more pronounced differences between the groups that are not due to treatment effects, but instead are due to the variable characteristics of the members in the groups. If instead I have a sample of 10,000 graduate students that that I am going to randomly assign to one of two treatment groups, the law of large numbers works for me. As explained by Thompson et al. (2005), “The beauty of true experiments is that the law of large numbers creates preintervention group equivalencies on all variables, even variables that we do not realize are essential to control” (p. 183). While there is still not identical membership across treatment groups, and I still expect that the observed differences between the control group and the experimental group are going to be due to any possible treatment effects

and to the error associated with the random assignment process, the expectation of equality of groups is nevertheless reasonably approximated. In other words, I expect the ratio of procrastinators to intrinsically motivated students to be approximately the same across the two treatment groups. In fact, I expect proportions of variables I am not even aware of to be the same, on average, across treatment groups! The larger sample size has greatly decreased the error due to chance associated with the random assignment process. As you can see in Figure 2, even if both of the sample studies produce identical treatment effects, the results are not equally valid. The majority of the effect observed in the small sample size study is actually due to error associated with the random assignment process and not a result of the treatment. This effect due to error is greatly reduced in the large sample size study.

Figure 2. Observed treatment effects in two studies with different sample sizes. The white area represents the amount of the observed effect due to the error associated with the random assignment process. The grey area represents the “true” treatment effect. Three Experimental Designs When well-conducted, a randomized experiment is considered the “gold standard” in causal research (Campbell, 1957; Campbell & Stanley, 1963;

“True” treatment effects n=

error

“True” treatment effects n=10,

e r r o r

equality of groups on the variable of interest prior to the intervention. These designs are “read” left to right to correspond to the passage of time (i.e., what happens first, second). Experimental Group R O X O Control Group R O O

The second true experiment is the Posttest-Only Control Group Design. This design varies from the first in that it controls for possible confounding effects of a pretest because it does not use a pre-intervention measurement. All three characteristics of a true experimental design are present as in the previous design: random assignment, intervention implemented with experimental group only, and post-intervention measurement. Experimental Group R X O Control Group R O The third and final design is the Solomon Four-Group Design. This design is the strongest of the three. It not only corrects for the possible confounding effects of a pretest, but allows you to compare these results, to an experimental and control group that did receive a pretest. The major drawback to this design compared to the others is the obvious increase in sample size needed to meet the needs of four treatment groups as opposed to two treatment groups. Experimental Group (with pre-test) R O X O Control Group (with pre-test) R O O Experimental Group (without pre-test) R X O Control Group (without pre-test) R O

In addition to detailing these designs in their seminal work, Campbell and Stanley (1963) firmly established their explicit commitment to experiments “as

the only means for settling disputes regarding educational practice, as the only way of verifying educational improvements, and as the only way of establishing a cumulative tradition in which improvements can be introduced without the danger of a faddish discard of old wisdom in favor of inferior novelties”(Campbell & Stanley, 1963, p. 172). Validity Threats Even when these designs are used, there are differences in how rigidly they are followed as well as to what extent the researcher addresses the multiple threats to validity (see Figure 3 below). Threats to validity are important not only to research designer but also to consumers of research. An informed consumer of research wants to rule out all competing hypothesis and be firmly convinced that the evidence supports the claim that the IV caused the DV. To merit this conclusion, an evaluation of the study is necessary to determine whether threats to experimental validity were recognized and mitigated.

Figure 3. Example of a research experiment and the questions you should ask yourself about internal and external validity. Adapted from (Sani & Todman, 2006).

hypothesized effects

internal validity Are we really observing these effects or the effects of other variables on the DV (procrastination vs. intrinsically motivated)? external validity Are these effects to be found in other contexts and people, or are they specific to our experimental setting and participants?

Independent Variable Graduate Advisor Meetings

Dependent Variable Procrastination/ Motivation scale

motivation/ procrastination scale than they would have had they gotten enough sleep.

Maturation: an observed change that is naturally occurring (such as aging, fatigue, hair length, number of graduate hours completed) that may be confused with the intervention effects but is really a function of the passage of time.
Statistical regression: the phenomenon that occurs when participant selection is based on extreme scores whereby the scores become less extreme, which may appear to be the intervention effect. If in our study of graduate students we purposively select students based on pretest scores of extreme procrastination, the extreme procrastinator graduate students will on the posttest not be as extreme in their procrastination tendencies. Regression toward the mean was first documented by Sir Francis Galton in the late 1800s. Galton (1886) measured the heights of fathers and sons at a World Exposition. Galton found that very tall fathers tended to have sons who were not quite as tall, and that very short fathers tended to have sons who were not quite as short. Clearly, this phenomenon is not a function of the exercise of will (i.e., fathers did not say to their wives, “Let‟s make a shorter son” or “Let‟s make a taller son”)!
Experimental mortality or attrition: a concern about a differential loss of participants, or of different types of participants from the experimental or control group that may produce an effect that appears to be due to the intervention. For example, if half of the students in the experimental group drop out of the study, but none of the control group members drop, we would likely question the results.

Were those students that left somehow different from the ones that remained? If so, would that difference have produced differential results than the ones we observed with the remaining participants?

Testing: the concern that a testing event will impact scores of a subsequent testing event. For example, if we give the graduate students the procrastination/ motivation scale prior to any graduate advisor meetings (the intervention), and then after the intervention we give them the procrastination/ motivation scale again, we may observe difference in the pre- and posttest that are due partly to familiarity with the test or the influence of the testing itself.
Instrumentation: the change in either the measurement instrument itself or the manner in which the instrument is implemented or scored that may cause changes that appear to be due to the intervention, or the failure to detect changes that actually did occur. For example, if between the first and second time that the procrastination/motivation test is given, the developers of the exam decide to remove ten of the questions, we do not know if the exclusion of those questions is responsible for differential scores or if the differences are due to treatment effects.
Additive and interactive effect of threats to internal validity: the concern that the impact of the threats may be additive or that presence of one threat may impact another. A selection-history additive effect occurs when nonequivalent groups are selected. For example, groups may be selected from two different locations, such as, rural and urban areas. The participants in the groups are nonequivalent by selection and they also have unique local histories. The

the same extent) if other outcome measurements were used. For example, if a person scores highest on a test for physical strength they may not necessarily score highest on a flexibility test.

Interactions of the causal relationship with settings: an effect that is present in a particular setting may not be present (or present to the same extent) in a different setting. For example, a particular after school character development program involving community project work may not work equally well in rural versus urban areas.
Context-dependent mediation: an explanatory mediator of a causal relationship in one context may not have the same impact in another context. For example, a study might find that a reduction in federal funding has no impact on student achievement because schools were able to turn to education foundation grants to provide them with additional resources. In another school district where schools did not have access to education foundation resources, the same causal mechanism may not be available. In addition to internal and external validity threats, there are other threats that we need to be aware of in the design and evaluation of studies. Interested readers may refer to such texts as Experimental and Quasi-Experimental Designs (Shadish et al., 2002) or Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (Creswell, 2003) for information about statistical conclusion validity and construct validity concerns.

EBP in Medicine, Psychology and Education While the origins of EBP may date back to the origin of scientific reasoning, the Evidence-Based Medicine Working Group (EBMWG) brought the discussion of EBP to the forefront of medicine (1992). In 1996, Evidence-Based Medicine (EBM) was defined as “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research” (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996, p. 71). While EBP has many supporters in medicine, EBP has caused some concerns among practitioners. Researchers have addressed concerns regarding the perception of EBM as a top down approach that results in ivory tower researchers dictating how practitioners should practice (Sackett et al., 1996) or similarly that evidence from randomized controlled trials may be valued more highly than practitioner expertise (Kübler, 2000). Yet, it is difficult to deny that there is great support for EBP considering the number of periodicals that have emerged since the years after EBMWG convened. A keyword search for “evidence-based” returns 100 serials on WorldCAT. A keyword search for “evidence-based” returns 96 serials in Ulrich’s Periodical Directory. At least 32 active periodicals, either in print form, electronic form, or both contain “evidence-based” within the title of the periodical. At least 26 of these periodicals are available electronically. See Table 1.

Table 1 (continued).

Start Year Title of Periodical

2000 Evidence-Based Gastroenterology 2000 Evidence-Based Oncology

2000 Trauma Reports: Evidence-Based Medicine for the ED 2001 Journal of Evidence-Based Dental Practice 2003 Evidence-Based Integrative Medicine 2003 Evidence-Based Midwifery 2003 Evidence-Based Preventive^ Medicine 2003 Evidence-Based Surgery 2003 (2005) International Journal of Evidence-Based Healthcare 2004 Evidence-Based Complementary and Alternative Medicine: eCAM 2004 Journal of Evidence-Based Social Work 2004 Worldviews on Evidence-Based Nursing 2005 Advances in Psychotherapy: Evidence-Based Practice 2005 Evidence-Based Ophthalmology 2005 Journal of Evidence-Based Practices for Schools 2006 Evidence-Based Child Health 2006 Evidence-Based Library and Information Practice 2007 Evidence-Based^ Communication Assessment and Intervention

Periodicals available electronically are shown in bold. Parenthetical dates indicate different start year date in WorldCAT.

The popularity of EBP is evident in psychology as well. The American Psychological Association‟s Presidential Task Force on Evidence-Based Practice specifically defined Evidence-Based Practice in Psychology (EBPP) as “the integration of the best available research with clinical expertise in the context of patient characteristics, culture, and preferences” (2006, p. 273). In addition to advocating evidence-based practices, this task force also established the two necessary components for evaluation of psychological interventions: treatment efficacy and clinical utility. Treatment efficacy specifically addresses questions such as how well a particular treatment works. This type of question lends itself to experimental investigation to draw valid causal conclusions about the effect of a particular intervention (or lack thereof) on a particular disorder (American Psychological Association, 2002). Chambless and Hollon (1998), in their review of psychological treatment literature, provide a description of variables of interest when evaluating treatment efficacy in research studies. The Task Force acknowledged that while there are other methods that may lead to causal conclusions “randomized controlled experiments represent a more stringent way to evaluate treatment efficacy because they are the most effective way to rule out threats to internal validity in a single experiment” (American Psychological Association, 2002, p. 1054). The appeals for evidence continue also in the field of education. Grover J. (Russ) Whitehurst, who directs the Education Department's Institute of Education Sciences, defined Evidence-Based Education (EBE) as “the integration of professional wisdom with the best available empirical evidence in making

Experimental Design 1, Summaries of Experimental Design

Related documents

Partial preview of the text

Download Experimental Design 1 and more Summaries Experimental Design in PDF only on Docsity!