






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth analysis of internal and external validity in research, with a focus on threats and solutions. Internal validity refers to the degree to which observed changes in a dependent variable can be attributed to changes in an independent variable. External validity deals with the generalizability of research findings. various threats to internal validity, including history, maturation, testing, instrumentation, statistical regression, attrition, selection, interactions with selection, diffusion or imitation of treatments, compensatory equalization of treatments, and experimenter expectancy. It also explores the relationship between internal and external validity and offers guidelines for assessing the degree of threat to internal validity. Furthermore, it discusses the concept of external validity and the major types of threats to it, as well as the situations where external validity should be a concern.
Typology: Lecture notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







dvances in clinical psychology critically depend on methods researchers use for investigating the causal relationships among variables. Research questions commonly include issues about the putative mechanisms associated with various forms of psychopathol- ogy (e.g., do dysfunctional beliefs cause depres- sion?) and questions about the effects of treatments (e.g., does cognitive therapy cause a greater reduction in eating disorder symptoms than does placebo?). How do we go about draw- ing objective conclusions about causality? This is a question about whether the manipulation of one variable (i.e., the independent variable) has effects on another variable (i.e., the dependent variable). The answer to this question is more complex than it might seem. Consider the fol- lowing common scenarios.
Scenario 1: Dr. Smith is a well-known local advo- cate of a controversial form of psychotherapy. He claims that it works faster and more powerfully than all other treatments. For many years he has practiced this treatment and trained other clinicians through workshops, even though there were no sci- entific data on its efficacy.When challenged on this point, Dr. Smith retorted that he had “seen all the
proof he needed with his own eyes.” That is, he claimed his patients almost invariably benefited from his therapy. When a treatment outcome study was conducted and published by an independent group of investigators, it was found that the psy- chotherapy used by Dr. Smith was no more effective than giving patients a placebo. Dr. Smith’s response was “there must have been something wrong with the research; the researchers didn’t include patients seen in the real world.” Scenario 2: The interns and their clinical supervi- sors gathered in the seminar room for the weekly journal club. This week’s article, which recently appeared in a leading journal, described an experi- mental investigation of the effects of cognitive fac- tors (expectations of disapproval) on social anxiety. Research participants led to expect high disap- proval (from an experimental confederate) experi- enced more social anxiety than participants who were led to expect no disapproval.The investigators concluded that expectations of disapproval can cause social anxiety and likely play a role in clinically severe conditions such as social anxiety disorder. Despite the many methodological strengths of the study, the participants of the journal club quickly searched out the methodological weaknesses. One of the interns raised a particularly important
23
question about the generalizability of the study. Her observation was met with nods of approval from her supervisors. By the time the journal club had fin- ished, all the attendees had convinced themselves that the study was fatally flawed. Some attendees even left the meeting with the impression that psy- chological research, even research published in leading journals, is largely a waste of time.
This chapter is written largely in response to these two types of scenarios, which we have encountered time and again. The reactions illustrated in these scenarios retard the scien- tific progress of clinical psychology. The first scenario raises many issues regarding internal and external validity. Dr. Smith claims that all his patients benefit from his treatment. Yet, his series of case observations have many problems of internal validity. His dismissal of a recent study of his psychotherapy raises the issue of external validity. The journal club scenario raises the question of what we can conclude from research that does not have “perfect” internal and external validity. The important issues raised in these scenarios are the focus of the remainder of this chapter. We will begin by defining internal validity and illus- trating the various threats to it. Some studies, such as those using quasi-experimental designs, are widely used in clinical research, despite their imperfect internal validity. After reading this chapter, you should have a good understanding of why such studies are used and why they are useful. After discussing internal validity, we then examine the concept of external validity (gener- alizability) and consider the relationship between internal and external validity. As you will see, there are some research situations in which high external validity is vital and other situations in which it is not a priority. Finally, we will conclude with some comments about how scientific knowledge can be advanced even though most research studies have imperfect internal or external validity. Throughout this discussion we will consider a number of com- monly used experimental designs that have been developed to deal with issues of internal and external validity. Our discussion of these designs will be illustrative rather than comprehensive. Detailed discussions of experimental and quasi- experimental designs are available elsewhere (e.g., Asmundson, Norton, & Stein, 2002; Barlow & Hersen, 1984; Campbell & Stanley,
1970; Cook & Campbell, 1979; Onghena & Edgington, 2005).
Internal validity is the degree to which observed changes in a dependent variable can be attrib- uted to changes in an independent variable. Thus, internal validity is a matter of degree (e.g., high, medium, low) rather than one of presence or absence. The researcher’s confidence in his or her findings is proportionate to the strength of internal validity of the research design (Finger & Rand, 2003). True experiments are designs that have strong internal validity; that is, participants are randomized to experimental conditions, and other means are used to ensure that changes in the dependent variable can be attributed to the experimental manipulation of the independent variable. Quasi-experimental designs have weaker internal validity, as we will illustrate later. There are several types of threat to internal valid- ity (Cook & Campbell, 1979; Finger & Rand, 2003; Rosenthal, 2002), including
Each of these threats to internal validity are defined and illustrated in the following sections.
History Description. When changes in the dependent variable are due to some extraneous event that takes place between pre- and posttest, it makes it difficult to determine whether the results were due to the experimental manipulation (i.e., changes in the independent variable) or to the extraneous event. In some research, such as a short study of memory, this threat can be con- trolled by shielding participants from outside influences during the study (e.g., testing them in a quiet lab) or by choosing dependent variables that could not plausibly have been affected by
24 INTRODUCTION/OVERVIEW
farther a score is from the mean, the more extreme it is. The more extreme the score, the rarer it is and the more likely it is to have been the result of a very rare combination of factors. If these factors are temporally unstable, then statis- tical regression will occur (Furby, 1973). Statistical regression is always toward the popula- tion mean of the group. Its magnitude is greater when the test-retest reliability of a measure is low (indicating that scores are readily influenced by chance factors) and when a person’s score is extreme, relative to the mean of the population from which the person was chosen (Cook & Campbell, 1979). Regression effects will not be a threat if assessment methods are chosen that are virtually error free or uninfluenced by random factors (e.g., measuring a person’s height; Finger & Rand, 2003). It is important to note that statis- tical regression effects can be due to psychologi- cally substantive phenomena and should not be automatically dismissed out of hand as statistical artifacts (Taylor, 1994). Regression might be either noise or the phenomena of interest, depending on one’s research goals. Examples. A gambling researcher screened a large group of students in order to identify people who could be classified as heavy gamblers, as measured by a questionnaire. When these people presented to the lab to participate in the experi- ment, they completed the questionnaire a second time. To the researcher’s chagrin, many of the participants no longer had extreme scores on the questionnaire and had to be excluded from the study. Another example concerns uncon- trolled treatment studies, in which a group of patients are selected on the basis of extreme scores on some measure (e.g., scores on a perfec- tionism scale), and then receive an intervention (e.g., a treatment for excessive perfectionism) as well as a posttest. Statistical regression may occur, resulting in what appears to be a treatment effect (i.e., a decline in scores from pre- to posttest). The solution to this problem is to include a con- trol group. Note that statistical regression is unlikely to occur if participants are selected because they have persistently elevated scores, such as people with chronically high scores on a measure of anxiety (i.e., high trait anxiety). Such people are unlikely to show statistical regression because this phenomenon is due to transient factors that produce elevations in scores (e.g., a near-miss while driving to the lab to participate in an experiment would transiently increase one’s anxiety).
Attrition Description. The loss of participants from a study (e.g., due to mortality or treatment dropout). This threatens internal validity if attrition is not random: for instance, if attrition is greater in one experimental condition than another, or if particular participants are most likely to drop out of the study. Attrition can cause serious problems for clinical researchers by introducing biases into an experiment. There are various methodological and statistical pro- cedures for limiting, evaluating, and correcting for attrition (see Flick, 1988). However, there are circumstances in which attrition can render the results uninterpretable, as illustrated in the fol- lowing example. Example. A residential treatment center reported that 80% of patients with anorexia ner- vosa were “much improved” or “greatly improved” after completing the program. Unfortunately, the results were biased because 20% of patients did not complete the program, and no outcome data were available for them. Some withdrew because they benefited quickly from treatment and felt that they no longer needed to be in the program. Others dropped out because they failed to bene- fit. Some severely anorexic patients had either died or withdrawn from the clinic and were admitted to hospital. Given the large proportion of treatment dropouts and the uncertainty about whether treatment completers differed, as a group, from treatment dropouts, it was not possi- ble to draw any legitimate conclusions from the treatment study.
Selection Description. When the effects on the depen- dent variable arise from differences in the kinds of people in the experimental groups. Selection effects are pervasive in quasi-experimental designs (Cook & Campbell, 1979). These are among the most widely used designs in clinical psychology, in which a target group is compared with one or more control groups (e.g., a group of healthy people, a group with another psy- chopathology). Attempts are made to match the groups on background variables (e.g., demo- graphics), and then they are compared on the variables of interest. However, assignment of par- ticipants to groups (e.g., target group vs. healthy control group) is, by definition, nonrandom in quasi-experimental designs. One must remember
26 INTRODUCTION/OVERVIEW
that the distinction between the target group and any control group is an observed not experimen- tally manipulated distinction. Example. Many studies have investigated whether people with anxiety disorders, as com- pared with healthy controls, tend to selectively focus their attention on sources of threat in their environment (Mogg & Bradley, 1999). Although such studies have yielded a good deal of useful information and have stimulated a great deal of research, these studies are prone to selection effects. That is, although the clinical (target) and control groups were matched on many back- ground variables, there is no guarantee that the differences between the groups (e.g., the threat- focused attention effects) were due to the pres- ence versus absence of an anxiety disorder. The effects could have been due to other factors that were not assessed in the study. In these studies, selection effects are addressed in three ways. First, the plausibility of confounding factors is taken into consideration. Anxious patients and healthy controls could differ on an almost infi- nite range of factors. Some of these factors could confound the study of threat-focused attention (e.g., depression), while other factors are less plausible (e.g., the participant’s Zodiac sign). Second, researchers try to control for all the plausible confounding factors (e.g., all partici- pants are asked to refrain from caffeine con- sumption on the day of testing; testing is done under conditions of normal or corrected-to- normal vision). Finally, if another confounding factor is subsequently identified (e.g., whether or not the person is taking antianxiety medica- tion), then the study can be replicated, control- ling for this factor. Interactions With Selection. Many of the previ- ously mentioned threats to internal validity can interact with selection to produce effects on the dependent variable that may be confused with effects due to the independent variable (Cook & Campbell, 1979). Examples include selection- history, selection-maturation, and selection- instrumentation effects. Selection-maturation interactions occur when the experimental groups mature at different speeds. Selection-history interactive effects occur when different experi- mental groups come from different settings, where each setting is associated with different histories. In a study designed to test the hypoth- esis that people with hypertension tend to be high in trait anger (i.e., anger proneness), for example, the hypertensive patients might tend to
live in stressful environments, whereas nor- motensive controls might tend to live in relatively low stress environments. Thus, the different histories of the groups (i.e., differences in envi- ronmental stressors) might be responsible for any group differences in anger proneness, even if anger is unrelated to hypertension.
Diffusion or Imitation of Treatments Description. When participants in the differ- ent experimental conditions can communicate with one another, such that participants in one condition learn about what happens in the other condition. This can undermine the differences between the experimental manipulations in each condition. Example. In a study of the effects of stress (in the form of electric shock) on snake phobias, snake-fearful undergraduate psychology students were randomly assigned to one of two groups. Participants in each group were tested individu- ally. Each participant was asked to walk up to a container housing a large, harmless snake and to touch it. Participants in the experimental group received a painful electric shock at a randomly determined point as they approached the snake. Participants in the control group experienced no shock as they approached the snake. Unfortunately, the students who had completed the experiment described their experiences with students who were soon to participate in the study. This contaminated the experimental manipulation because many of the people in the control group had heard about the electric shock. As they approached the snake they wor- ried about getting a shock. Thus, the “no shock” control condition was compromised. Possible solutions to the problem of diffusion of treat- ments is to ask participants not to discuss the experiment with other students (during the period in which the study is being conducted) or to use experimental designs in which diffusion is not an issue (e.g., in the snake example, one could explicitly inform the control participants that they have been allocated to a “no shock” condition).
Compensatory Equalization of Treatments Description. When participants learn that they have been assigned to an experimental condition where they won’t receive the possible benefits received by participants in another experimental
Internal and External Validity in Clinical Research 27
cognitive-behavioral therapy (CBT) of panic dis- order, we conducted a case study of an unusual presentation, in which the patient’s panic disor- der appeared to arise from blood-injury reactiv- ity (vasovagal dizziness and fainting in response to the sight of blood or injury; Anderson, Taylor, & McLean, 1996). The patient was initially treated with standard CBT for panic disorder (Taylor, 2000). Two years later, he relapsed when exposed unexpectedly to blood-injury stimuli. This led us to hypothesize that his blood-injury reactivity played a causal role in his panic disor- der. To test this possibility, we provided the patient with another course of standard CBT for panic disorder. As before, he was no longer pan- icking after treatment. Then we asked the patient if we could expose him to blood-injury stimuli for one month (a videotape of injections and blood extractions). The thought of being exposed to such a tape stimulated blood-injury reactions (e.g., dizziness), which were followed by a relapse of his panic disorder. The next part of the case study involved treating the patient with applied tension, which is a specific treat- ment for blood-injury reactivity (Öst & Sterner, 1987). This treatment reduced his panic attacks and blood-injury reactivity. When he was reex- posed to the videotape, he did not have any blood-injury reactions, his panic disorder did not return, and he was free of psychopathology at his four-month follow-up. This case study involved an ABABCB design, where A = the first and second courses of CBT, B = exposure to blood-injury stimuli, and C = treatment with applied tension. This design makes it unlikely that the results are due to threats to internal validity such as history or maturation. There are many other types of single-case experimental designs, which can be used for other types of research questions (see Barlow & Hersen, 1984; Onghena & Edgington, 2005). Studies using single-case experimental designs are useful for studying unusual cases and for conducting preliminary evaluations of new treatments. These studies are insufficient in themselves for drawing strong conclusions, but they provide some indication of whether it is useful to conduct further investigations. Early case studies of CBT for panic disorder (e.g., Clark, Salkovskis, & Chalkey, 1985) provided encouraging results, which led researchers to conduct open (uncontrolled) trials to evaluate the treatment with more patients (e.g., Sokol,
Beck, Greenberg, Wright, & Berchick, 1989) and to randomized, controlled trials (which have very strong internal validity; Barlow, Gorman, Shear, & Woods, 2000) in which CBT was com- pared to control conditions (e.g., waiting list or placebo) and to other treatments (e.g., imipramine). Thus, even though single-case experimental designs often have far-from-perfect internal validity, they can yield valuable informa- tion and thereby can advance our understanding of psychopathology and its treatment. Drawing inferences, whether in quasi- experiments or experiments, is a matter of rul- ing out rival hypotheses (e.g., hypotheses about the role of threats to internal validity) that could account for the results. Randomizing partici- pants to experimental and control groups can overcome many of the threats to internal valid- ity. Random selection of participants and random allocation to experimental conditions ensures, within the limits of sampling error, that the sample is representative of the target popu- lation and that the samples in the experimental groups are comparable to one another in terms of the background features of the participants, such as demographics or other variables (Cook & Campbell, 1979). Randomization doesn’t control for some threats, such as diffusion of treatments, com- pensatory equalization of treatments, or exper- imenter expectancy. These threats can be overcome by other means, such as those men- tioned earlier. For quasi-experimental designs, however, there is always some degree of threat to internal validity, such as the selection threat. Cook and Campbell (1979) offer the following guidelines about how to assess the degree of threat to internal validity.
Estimating the internal validity of a relationship is a deductive process in which the investigator has to systematically think through how each of the internal validity threats may have influenced the data. Then, the investigator has to examine the data to test which relevant threats can be ruled out. In all of this process, the researcher has to be his or her own best critic, trenchantly examining all of the threats he or she can imagine. When all of the threats can be plausibly eliminated, it is possible to make confident conclusions abut whether a relationship is probably causal. (p. 55)
Internal and External Validity in Clinical Research 29
External validity has to do with the generaliz- ability of the research findings; to what extent can the findings of an experiment or quasi- experiment be generalized to and across various populations, settings, and epochs? In the follow- ing sections, we examine, in further detail, the major types of threats to external validity, the relationship between internal and external validity, and the situations in which we should (or shouldn’t) be concerned with threats to external validity. Threats to external validity are evaluated by tests of the extent to which one can generalize across various kinds of people, set- tings, and times and are, in essence, tests of sta- tistical interactions (Cook & Campbell, 1979). The major threats include three types of interac- tions with the experimental condition that the participants are in. These are interactions with selection, setting, and history.
Interaction of Selection and Experimental Condition
Description. This concerns the question of whether the findings from the selected group of research participants can be generalized to other categories of people, such as people with other geographic or demographic features. Examples. A study comparing patients with severe major depression with healthy controls might seek to match the participants on demographic features. Many severely depressed patients are unable to work and are therefore unemployed, receiving welfare or disability assistance. To match the patients with the con- trols on demographic factors, the researcher might decide to include only unemployed con- trol participants. While this strengthens the internal validity of the study, it raises the ques- tion of whether the results can be generalized to people from other levels of occupational func- tioning. If the results of the research study vary across occupational levels, then there is an inter- action between selection (in this case, occupa- tional status) and experimental condition. This interaction threatens the external validity of the study. The only way to determine whether this threat exists is to determine whether the results vary with occupational status. This means that further studies might be needed to better under- stand the external validity of the findings.
A popular research strategy is to use ana- logue samples. For example, students may be selected because of their high scores on a measure of schizotypy for a study of variables thought to be relevant to schizophrenia. Analogue studies have the advantage of having strong internal validity (e.g., randomized assignment of schizotypal students to two or more experimental conditions). However, ana- logue studies may also have important prob- lems with external validity. Can findings obtained from schizotypal students who, for example, report having some degree of magical thinking and perceptual aberration, be general- ized to people with schizophrenia? Studies using clinical samples also may encounter problems with external validity. Some treatment outcome studies, for example, may be highly selective in the patients that are enrolled. A study of the treatment of bulimia nervosa might only include patients if they agree to suspend any other treatment they might be receiving and remain on a stable dose of any psychotropic medication they might be receiv- ing. These research requirements have the advantage of controlling for threats to internal validity, but they do raise questions about exter- nal validity; that is, are the patients yielding these clinical findings representative of patients typically seen in clinical practice? If the patients are not representative, then the question arises as to whether the treatment findings can be gen- eralized to clinical practice in the “real world.” These concerns with patient representativeness and the use of analogue samples were raised in the two scenarios that opened this chapter. Even when participants belong to the target population of interest, recruitment factors might lead to threats to external validity (Cook & Campbell, 1979). A researcher, for example, who is interested in studying conversion disor- der might recruit patients by placing advertise- ments in the local newspaper. This process of recruitment could possibly result in a sample of people with conversion disorder that is unrep- resentative of people in general with this disor- der. This threat to external validity can be examined by comparing patients recruited from the newspaper to patients recruited by other means (e.g., from physical referrals) to see whether the groups differ on relevant variables such as the type and severity of the conversion disorder.
30 INTRODUCTION/OVERVIEW
excluding the more severe patients, it is not pos- sible to determine whether the memory results can be generalized to more severe cases of the disorder. When Does External Validity Matter? One should not automatically assume that it is impor- tant that a study has good external validity. We may not be so concerned with external validity if the focus of the investigation concerns what can hap- pen, instead of what typically does happen (Mook, 1983). Thus external validity is less of a concern if the goal of one’s research is to test predictions derived from theory or conjecture. Consider, for example, patients who report that they suddenly became aware of long-buried memories of child- hood sexual abuse. The veracity of such “recov- ered” memories is highly controversial. Some clinicians argue that these are genuine memories that had been repressed and then retrieved. A number of researchers have argued that these are false memories, sometimes implanted by thera- pists using hypnosis, guided imagery, or other “memory recovery” techniques to get to the bot- tom of the patient’s problems (for a review of this debate, see McNally, 2003). This debate raises the following question about the mechanisms of memory, which has been evaluated in several lab- oratory studies: Is it possible to implant a clearly false childhood memory using memory recovery techniques? Note that this is not an issue of does it happen but a question of can it happen. The answer is yes. Analogue research using university students has shown that it is possible to lead the participants to “recall” something that, according to their parents, never happened to them, such as being savagely mauled by a dog (e.g., Porter, Yuille, & Lehman, 1999). Although such findings have relevance to the memory recovery controversy, the primary value of this type of research is to shed light on memory processes. To determine whether external validity is important in a given research investigation, you need to consider the conclusion that you would like to make and whether your sample and research design will enable to you reach this conclusion. The following is a sample of ques- tions that you might ask in deciding whether the usual criteria of external validity should even be considered (Mook, 1983):
draw conclusions not about a population but about a theory that specifies what these partici- pants ought to do? Or (as in the case of false memories) would it be important if any subject does, or can be induced to do, this or that?
Evaluating and Improving External Validity. There are several ways of evaluating and improv- ing external validity. One approach is to try to ensure that the sample is representative of the target population. The deliberate inclusion of a heterogeneous sample can be used to determine if particular variables predict the results. If you are conducting a treatment outcome study, for example, and want to know whether the results vary with socioeconomic status (SES), then you could select patients from a range of different SES levels (using stratified random sampling) and determine whether SES predicts treatment outcome. Note that this approach often requires a large sample (e.g., n = 50 per treatment condi- tion), so that sufficient numbers of participants from each SES level are in each treatment condi- tion. Another approach is to conduct multiple studies across different subgroups, settings, or times. This provides a means of determining whether the findings are replicable. Benchmarking studies can also be used as a means of evaluating external validity. These are investigations in which research conducted in tightly controlled laboratory situations (which have high internal validity and may have low external validity) are compared with field studies (which may have good external validity but lower internal validity). A recent meta-analysis com- pared results from lab and field studies across a range of research domains, including clinically relevant investigations such as studies of aggres- sion or depression (Anderson, Lindsay, & Bushman, 1999). The investigators examined the correspondence between lab- and field-based effect sizes from studies using conceptually similar dependent and independent variables. The results
32 INTRODUCTION/OVERVIEW
of field research tended to mirror the findings from lab research, suggesting that lab studies gen- erally have good external validity. To illustrate benchmarking research, several studies have been conducted in which treatment- outcome findings from a university-based spe- cialty clinic (e.g., the Center for Anxiety and Related Disorders at Boston University) are compared to findings from community mental health clinics. The university-based research tended to have high internal validity, although the use of patient inclusion and exclusion crite- ria raised questions about the external validity of the findings. Patients are typically excluded from CBT studies if their doses of psychotropic medication are unstable or if they have particu- lar comorbid disorders. Studies of panic disor- der, for example, often exclude patients who have comorbid paranoid, schizotypal, or bor- derline personality disorders. Studies conducted in community clinic settings are more liberal in their inclusion criteria and more closely approx- imate routine clinical treatment that patients would receive. This means that these studies have good external validity but weaker internal validity. Benchmarking studies of major depres- sion and panic disorder indicate that the results from community clinics are similar to those obtained in university clinics and that the patients from both settings are broadly similar in their pretreatment clinical characteristics, such as the severity and duration of their disor- ders (e.g., Merrill, Tolbert, & Wade, 2003; Wade, Treat, & Stuart, 1998). These findings indicate that tightly controlled treatment studies from university clinics have good external validity. Such studies address the concerns of critics like Dr. Smith from Scenario 1, who claimed that treatment research findings do not generalize to patients in the “real world.”
Few, if any, research studies are methodologi- cally perfect. Some consumers of the research literature tend to throw out the baby with the bathwater; that is, if a study has a minor limita- tion, they tend to dismiss it entirely. This was the case for the attendees of the journal club discussed in Scenario 2. But is it really true that
“imperfect” studies are worthless? If this were the case, then scientific progress would not be possible—neither in psychology nor in the other sciences. But what can we legitimately conclude from imperfect investigations? Like all areas of science, no single study in clinical psychology provides the final answer to an important research question. Science is a cumulative process, whereby different studies investigate the research questions in different ways, controlling for different factors. In other words, science progresses through the develop- ment of cumulative findings from programs of research (Lakatos & Musgrave, 1970). The overall pattern of findings that emerges across studies is the most important factor in answer- ing important research questions. The strength of internal and external valid- ity of a study can help researchers evaluate the relative importance of that study in an overall program of research. If a study has very weak internal validity, then it may be given little or no consideration in evaluating what the cor- pus of research suggests about an important research question. A study might have several strengths but might have some noteworthy weaknesses. A weakness of an analogue study of schizophrenia, for example, has the shortcom- ing of not using actual participants with the disorder. This is not a legitimate reason for dis- missing the study altogether. The limitation simply raises another question to be answered in another study: If people who have features similar to schizophrenia (analogues) produce particular patterns of findings, then do people with full-blown schizophrenia show the same pattern of results? The analogue study may have high internal validity and lower external validity, whereas the field study (using actual patients with schizophrenia) would probably have lower internal validity (because it is diffi- cult to control for all confounding factors when using clinical samples) but higher external validity. Together, the two types of studies com- plement one another. Internal and external validity are important issues in evaluating the merits of a study, but they are not the only considerations. Other important issues include the way the data are analyzed, the reliability and validity of the mea- sures or manipulations used, and the statistical power of the design. Those issues are discussed elsewhere in this volume.
Internal and External Validity in Clinical Research 33