Validity & Reliability in Psychological Research: Understanding Validity Types, Study notes of Psychology

An overview of the concepts of validity and reliability in psychological research. It discusses various types of validity, including construct, internal, and external validity, and their importance in ensuring the accuracy and generalizability of research findings. The document also covers threats to validity, such as instrumentation, history, selection, experimenter bias, and confounding, and strategies for controlling these threats. Additionally, it touches upon the concept of reliability and its relationship to validity.

Typology: Study notes

Pre 2010

Uploaded on 07/29/2009

koofers-user-6je
koofers-user-6je 🇺🇸

5

(2)

9 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
VALIDITY OF MEASUREMENTS
For a measurement to be of any use in science, it must have both reliability and validity.
1. Reliability of Measurement - Property of consistency of a measurement that gives same
result on different occasions.
2. Validity of Measurement - Property of a measurement that tests what it is supposed to test.
Reliability and Validity are two different concepts.
EG: One-item intelligence test: measure hat size. Obviously lousy intelligence test because
no correlation between IQ and head size. However, measurement is reliable because
result is same every time and does not vary (size of your head). However, result is not
valid because it has nothing to do with intelligence.
EG: Different one-item test of intelligence: What did Dow-Jones stock average close at
yesterday? Test valid because it is something that is general knowledge (sort of, but
biased to people with money to invest). On average, people who know the Dow-Jones
average are more intelligent, but it is not a reliable measurement (inconsistent results
from same or another person over time). So both tests are bad, but for different
reasons: Hat-size test yields reliable but invalid data, and Dow-Jones average
test yields valid but unreliable data.
TYPES OF VALIDITY OF MEASUREMENTS
1. Construct Validity
2. Face Validity
3. Content Validity
4. Criterion Validity
1. Construct Validity of a Test - Property of a test that measurements actually measure
constructs they are designed to measure, but no others.
EG: 1st, test must actually measure whatever theoretical construct it’s testing, and not
something else. Test of leadership ability should not actually test extraversion.
EG: 2nd, a test must measure what it intends to measure and not theoretically unrelated
constructs. Test of musical aptitude should not require too much reading ability.
EG: 3rd, a test should be useful in predicting results related to the theoretical concept. Test
of musical ability should predict who benefits from music lessons, differentiate groups
who have and have not chosen music as a career, should relate to other tests of
musical ability, etc.
2. Face Validity of a Test - A test should appear superficially to test what it is supposed to test.
EG: A test may have a high or low degree of validity regardless of its face validity. Are
Rorschach tests just a bunch of inkblots or do they have something to do with
measuring personality? Note: Face validity can have more to do with public relations
than with true validity.
3. Content Validity of a Test – concerned with assessing current performance rather than
predicting future performance. Test should sample the range of behavior represented by the
theoretical concept being tested.
EG: Intelligence test should measure general knowledge, verbal ability, spatial ability, and
quantitative skills among others. An intelligence test that measured only spatial ability
would not have sufficient content validity.
pf3
pf4
pf5
pf8

Partial preview of the text

Download Validity & Reliability in Psychological Research: Understanding Validity Types and more Study notes Psychology in PDF only on Docsity!

VALIDITY OF MEASUREMENTS

  • For a measurement to be of any use in science, it must have both reliability and validity. 1. Reliability of Measurement - Property of consistency of a measurement that gives same result on different occasions. 2. Validity of Measurement - Property of a measurement that tests what it is supposed to test.
  • Reliability and Validity are two different concepts. EG: One-item intelligence test: measure hat size. Obviously lousy intelligence test because no correlation between IQ and head size. However, measurement is reliable because result is same every time and does not vary (size of your head). However, result is not valid because it has nothing to do with intelligence. EG: Different one-item test of intelligence: What did Dow-Jones stock average close at yesterday? Test valid because it is something that is general knowledge (sort of, but biased to people with money to invest). On average, people who know the Dow-Jones average are more intelligent, but it is not a reliable measurement (inconsistent results from same or another person over time). So both tests are bad, but for different reasons: Hat-size test yields reliable but invalid data , and Dow-Jones average test yields valid but unreliable data.

TYPES OF VALIDITY OF MEASUREMENTS

**1. Construct Validity

  1. Face Validity
  2. Content Validity
  3. Criterion Validity
  4. Construct Validity of a Test -** Property of a test that measurements actually measure constructs they are designed to measure, but no others. EG: 1st, test must actually measure whatever theoretical construct it’s testing, and not something else. Test of leadership ability should not actually test extraversion. EG: 2nd, a test must measure what it intends to measure and not theoretically unrelated constructs. Test of musical aptitude should not require too much reading ability. EG: 3rd, a test should be useful in predicting results related to the theoretical concept. Test of musical ability should predict who benefits from music lessons, differentiate groups who have and have not chosen music as a career, should relate to other tests of musical ability, etc. 2. Face Validity of a Test - A test should appear superficially to test what it is supposed to test. EG: A test may have a high or low degree of validity regardless of its face validity. Are Rorschach tests just a bunch of inkblots or do they have something to do with measuring personality? Note: Face validity can have more to do with public relations than with true validity. 3. Content Validity of a Test – concerned with assessing current performance rather than predicting future performance. Test should sample the range of behavior represented by the theoretical concept being tested. EG: Intelligence test should measure general knowledge, verbal ability, spatial ability, and quantitative skills among others. An intelligence test that measured only spatial ability would not have sufficient content validity.

4. Criterion Validity of a Test –test should relate closely to other measures of the same theoretical construct. EG: A valid test of intelligence should correlate highly with other intelligence tests. It should also correlate with behaviors that are considered to require intelligence, such as doing well in school. If criterion of an intelligence test is whether it correlates with how well a child is doing in school at time test is given , it is called concurrent validity. If the criterion of an intelligence test is how well the test can predict some future performance of the child , such as graduation from college, then it is called predictive validity.

VARIABILITY AND MEASURMENT ERRORS

  • Task of research is to find relationships between IV and DV (i.e., find how DV changes with changes in the IV)
  • Variability (variance) in the DV is good when it is associated with changes in the IV, but when it’s not, you have… Error Variance - Also as Random Error - variability in the DV that is not associated with the IV (see types of measure errors below). EG: How you stand each time on a floor scale to measure your weight

TYPES OF MEASUREMENT ERRORS - two types

1. Systematic Error (also as Constant Error) – measurement error that is associated with consistent bias EG: Body weight not considered an error as it is associated with IVs: loss of water during the night, thirst induced by salt, overeating, etc. Weighing subjects in the morning, as apposed to night, clothed, as apposed to unclothed, introduces systematic errors. However, this is not a bad thing as long as you keep it consistent for all groups. 2. Random Error - variability in DV that is not associated with IV EG: Precisely how a subject is weighed on a floor scale. Random Error in measurement introduced by exactly where subject places feet or leans on the scales. This error is a threat to reliability of measurement because it reduces precision of assessment of effects of the IV.

RELIABILITY OF MEASURES - two types

1. Test-Retest – same result can be obtained over time. EG : Time-dependent changes in the accuracy of a floor scale; retaking the SAT or GRE. 2. Internal Consistency - whether various items on a test are measures of same thing. EG : Tests of internal consistency of DVs ( S plit-Half Reliability ) where items on test divided into two separate tests. Scores on the two halves correlated to see how closely various individuals' scores agree on both halves (good test - high split-half correlation). Kuder-Richardson-20 test for multiple-choice tests computes all possible split-half correlations for agreement.

VALIDITY OF RESEARCH - Problems that threaten validity

  • Validity of Research - indication of accuracy in terms of the extent to which a research conclusion corresponds with reality EG: Simply means that conclusions are true or correct about actual state of the world. Validity problems threaten conclusion that cause-effect relationships exists among some variables or explanation of relationship obtained. Validity (of Research) – four types (Cook et al., 1990): **1. Internal Validity (and Confounding)
  1. Construct Validity**
  • Construct validity similar to internal validity - in internal validity, you must rule out alternative variables as potential causes of behavior; in construct validity, you must rule out other possible theoretical explanations of the results.
  • For internal validity, you redesign study to control for source of confounding; for construct validity, you design a new study that permits choice between two competing theoretical explanations of the results. 3. External Validity - how well findings generalize to other situations or populations (i.e., different subjects, settings, times, etc.) EG: Which variables are trivial? 13 Seniors at University of Memphis on a rainy 20th of June 2004 in a room with gray walls and a male experimenter wearing a dress. EG: McGinnies (1949) asked people to read words flashed on a screen. Results interpreted as showing a person's threshold for seeing taboo words was higher than threshold for seeing ordinary words. Use of language in public has changed since then. Unlikely that this experiment would yield the same results today and therefore results cannot be generalized to today's world; it lacks external validity. 4. Statistical Validity - extent to which data are shown to be the result of cause-effect rela- tionships rather than accident. Similar to internal validity. EG: Question is, was an observed relationship between an IV and DV true cause-effect relationship, or was the result accidental, and thus caused by pure chance? Statistical tests establish only that an outcome has a low probability of happening by chance ; it does not guarantee that it was the result of a true cause- effect relationship. No way to guarantee any of the types of validity of a research result ; all methods of judging validity simply increase confidence in conclusion that has been drawn. Nevertheless, inferential statistics is an essential tool in judging validity of a research outcome.

THREATS TO INTERNAL VALIDITY

  • Internal Validity is extent to which a study provides evidence of a cause-effect relationship between IVs and DVs

Eight Major Threats to Internal Validity:

**1. History (Subject Traits and Outside Influences)

  1. Maturation
  2. Testing
  3. Statistical Regression
  4. Selection
  5. Mortality
  6. Experimenter Bias
  7. Instrumentation
  8. History (Subject Traits and Outside Influences)** – external influence on study by learned and inherent characteristics of the subjects or when events outside of the laboratory affect subjects in a manner that influences the results.

History can be divided into two parts: A. Proactive History - refers to learned and inherent differences subjects bring with them to the study (height, weight, sex, etc.) EG : Random assignment of subjects to conditions usually control for this. B. Retroactive History - refers to changes in events between the 1st and 2nd measurement.

EG : Subjects given an “attitudes toward police scale” during the 1st week of a study. During time from 1st to 2nd rating session, subjects hear news story of how two students were killed by police. Identifying and removing subjects who have been contaminated by this event common way to control for this.

2. Maturation - source of error related to amount of time between measurements. More critical problem with children because they change more rapidly over time than adults. EG: Study examining motor learning in children would need to take into account significant lapses of time in which changes could occur in motor coordination, knowledge, and the like that could influence results. EG: Maturation can be a critical feature of a study, such as attitudes toward alternative lifestyles as a function of age. 3. Testing - effects which may occur on scores when a test is repeated. Being tested influences performance in later experiments or administration of the test. Subjects become sophisticated about testing procedure or may learn how to take tests so that their later behavior is changed by earlier experience ( Practice Effect ). EG: General observation: students generally do better on second and later tests in a course after experience with the style of testing. Phenomenon similar to maturation in that subjects are changed over time, but is different in that change is caused by the testing procedure itself , rather than by processes unrelated to the test. 4. Statistical Regression - operates when groups are selected on the basis of extreme scores. Tendency of subjects with extreme scores on a first measure to score closer to the mean on a second testing. EG: Regression effect can occur when 2 different variables are correlated, such as SAT score and college GPA. It may also occur when same variable is measured twice, such as a student who repeats the SAT. This arises when there is error associated with unreliability of the measuring device (i.e., the test itself is not a perfect measure of construct being measured). Another example is blood pressure. There are many blood pressure readings a doctor may take from a patient. However, the doctor knows the first reading is generally high. Most blood pressure readings after this decrease to the individual's normal blood pressure. This is regression towards the mean and is typical in most subject responses. Controlled by retesting subject’s or taking more than one measurement. EG: Classic Example …teacher who notices that students who scored highest on the first test usually do less well on the second, whereas those who did the worst improve. The teacher often concludes that the ones who did well the first time rested on their laurels for the second test, whereas the ones who did poorly worked harder. In reality, this is not what happened. Whenever random error exists in the measurement of a variable, individuals will deviate from their true score by chance. Solution …test them repeatedly until they bleed. On retest, errors will tend to average out , and the scores of these previously extreme individuals tend to return toward their true value, closer to the mean. 5. Selection - Except for random selection, any other procedure to choose subjects may result in a sample carrying traits that are not representative of the population as a whole. EG: Many studies compare 2 or more groups. Choice of Neo-Nazi skinheads would not be a good group to compare with Society to Protect Baby Seals (or would it?). But would skinheads from Detroit be a good choice to compare with skinheads from Dresden? Control by matching subjects or making the characteristic an IV.

conducted their own experiment instead of the one you thought they were conducting. Whenever people are aware that they are participating in an experiment, their behavior may be different from their everyday behavior. Common example is reaction of people to having a movie camera pointed at them.

  • Ambiguous effect of IVs results from the fact that any psychological experiment for which a person has volunteered must be considered to be a social situation in which participant has preconceived ideas about what is expected.

These preconceived ideas lead to several subject tendencies: A. Good-Subject Tendency: tendency of experimental participants to act according to what they think the experimenter wants. Subjects may pretend to be fooled by instructions to be "good subjects”. EG: Subjects may deliberately feign a naive attitude about expected results even though they can guess the true purpose of the study (heard about it or learned of similar studies elsewhere).

B. Evaluation Apprehension: Also known as social desirability – concern on the subject’s part about the impression their behavior will reflect to the experimenter (try to appear as socially desirable as possible). EG: Some subjects convinced that experiment is a carefully disguised measure of intelligence or emotional adjustment. This expectancy gives rise to evaluation apprehension , in which participants tailor their behavior to make themselves look as normal as possible. Develop attitude scales to ensure that various responses appear equally socially desirable so that subjects will not damage results by concealing their true attitudes. EG: Effects of pornography on sexual behavior. Participants asked to keep a diary of all sexual activity for a week before and after they are shown a pornographic movie. People would hesitate to volunteer information about deviant activities. Even if they were honest they might modify their behavior in direction of social desirability.

THREATS TO EXTERNAL VALIDITY

  • External Validity - how well findings generalize to other situations or populations
  • Even if an experiment has internal, statistical, and construct validity, it may not be generalizable to other situations.

Three Important Threats to External Validity:

1. Other Participants - Must not assume that any animal can be substituted for any other in all situations. EG: College students and white rats are readily accessible but also presumed to be representative. Although we are interested primarily in human behavior, the degree to which common principles of behavior operate across species is impressive. Skinner (1956) showed that behavior of pigeons, rats, and monkeys under certain experimental conditions are identical in all-important respects. Regardless, human subjects should be chosen with attention to their representativeness relative to some larger population. 2. Other Times - Many historical trends render particular research findings invalid, whether they concern use of language, attitudes toward foreign countries, or perception of deviant groups. EG: Perception and attitudes toward sex have changed over time (less or more stringent).

3. Other Settings - How the phenomenon observed in one laboratory can be related to a similar phenomenon observed in another laboratory or in the real world. EG: Although laboratory research ensures higher level of control, it is sometimes not easy to decide if a certain effect is simply a laboratory effect or whether it would survive transplantation to the outside world.

THREATS TO STATISTICAL VALIDITY

  • Statistical Validity - Extent to which data are shown to be the result of cause-effect rela- tionships rather than accident 1. Improper Use - Arise from inappropriate choice of statistical method or improper use of statistics in analyzing the data. EG: Major threat may be a conclusion that the IV had no effect, but that the conclusion is erroneous because your study may have employed too few subjects or made too few observations. SEE Type I and Type II errors.

VALIDITY in a NUTSHELL (17 points)

  1. A conclusion based on research is valid when it corresponds to the actual state of the world.
  2. Four types of research validity are commonly recognized: internal validity, construct validity, external validity, and statistical validity.
  3. An investigation has internal validity if a cause-effect relationship actually exists between the independent and dependent variables.
  4. Confounding occurs when the effects of two independent variables in an experiment cannot be separately evaluated.
  5. Construct validity concerns question of whether results support the theory behind the research.
  6. Every experiment tests auxiliary hypotheses in addition to the main hypothesis. These auxiliary hypotheses are that particular conditions of the experiment are valid measures of the theoretical concepts the experiment is testing.
  7. External validity concerns whether the results of the research can be generalized to another situation: different subjects, settings, times, and so forth.
  8. Statistical validity concerns whether the observed relationship is a true cause-effect relationship or is accidental.
  9. Threats to the internal validity of an experiment include events outside the laboratory, maturation, effects of testing, regression effect, selection, and mortality.
  10. The regression effect occurs when subjects are tested on related measures and there is error in the measurement. Individuals who performed at the extremes on one measure will tend to score closer to the mean on the other.
  11. Threats to construct validity include a loose connection between theory and experiment and the ambiguous effect of independent variables.
  12. Among the problems that cause an ambiguous effect of the independent variables are tendencies for participants to interpret conditions differently from the experimenter, the good- subject tendency, and evaluation apprehension.
  13. Threats to extern. val. include problems from generalizing to other subjects, times, or settings.
  14. Certain threats to validity are more prominent in particular types of research than in others.
  15. Psychology experiments may be considered social situations with their own role demands that may interfere with the purpose of the study.
  16. Ways of preventing role demands from biasing experimental results include inventing a cover story that deceives the participant about the purpose of the experiment, dividing the experiment in such a way that part of the data are collected in another setting, using measures that are unlikely to be influenced by the participant's expectations, and keeping the participant unaware that an experiment is being conducted.
  17. Experimenter bias can be reduced by keeping the experimenter from knowing the conditions in the experiment or its purpose and by standardizing the procedure as much as possible.