Behavioral Observation & Coding: Method for Testing Behavior Theories, Lecture notes of Social Psychology

An overview of behavioral observation as a research method, focusing on its use in identifying behaviors worth theorizing about and testing theories of behavior. various aspects of behavioral observation, including contexts for observation, recording methods, interrater agreement, reliability, validity, and analyzing data. Behavioral observation is used in both naturalistic and laboratory settings to examine behaviors and their relations, and can be complemented with self-report methods and other data collection methods.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

gerrard
gerrard 🇮🇹

3.9

(7)

213 documents

1 / 72

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
BEHAVIORAL OBSERVATION AND CODING
Behavioral Observation and Coding
Richard E. Heyman
Michael F. Lorber
New York University
J. Mark Eddy
University of Washington
Tessa V. West
New York University
Chapter to appear in Harry T. Reis and Charles M. Judd (Eds.), Handbook of Research Methods
in Social and Personality Psychology (2nd ed.). New York: Cambridge University Press.
Author Notes
Preparation of this chapter was supported by National Institute of Dental and Craniofacial
Research grant R21DE01953701A1 and National Institute of Child Health and Human
Development grant R01 HD054880. We are indebted to Ashley Dills for her masterful
copyediting.
Contact information
Richard E. Heyman, Ph.D.
Professor
Family Translational Research Group
Department of Cariology and Comprehensive Care
New York University
345 East 24th Street, 2S-VA
New York, NY 10010
212-998-9984
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48

Partial preview of the text

Download Behavioral Observation & Coding: Method for Testing Behavior Theories and more Lecture notes Social Psychology in PDF only on Docsity!

BEHAVIORAL OBSERVATION AND CODING

Behavioral Observation and Coding Richard E. Heyman Michael F. Lorber New York University J. Mark Eddy University of Washington Tessa V. West New York University Chapter to appear in Harry T. Reis and Charles M. Judd (Eds.), Handbook of Research Methods in Social and Personality Psychology (2nd ed.). New York: Cambridge University Press. Author Notes Preparation of this chapter was supported by National Institute of Dental and Craniofacial Research grant R21DE01953701A1 and National Institute of Child Health and Human Development grant R01 HD054880. We are indebted to Ashley Dills for her masterful copyediting. Contact information Richard E. Heyman, Ph.D. Professor Family Translational Research Group Department of Cariology and Comprehensive Care New York University 345 East 24th Street, 2S-VA New York, NY 10010 Email: [email protected] 212-998-

BEHAVIORAL OBSERVATION AND CODING

Behavioral Observation and Coding Kurt Lewin (1951, p. 169) wrote that there is “nothing so practical as a good theory.” One could add that there is nothing so practical as a good theory testing tool. We devote this chapter to one such tool — behavioral observation — that excels at both the identification of behaviors worth theorizing about and the testing of theories of behavior. This chapter provides an overview of behavioral observation, including the contexts researchers use when observing, the forms in which they record behaviors for analysis (e.g., coding), the methods available to document that different observers coded behaviors similarly (i.e., interrater agreement, an element of reliability), the necessity of establishing other forms of reliability as well as validity, and methods of analyzing behavioral observation data. What is Behavioral Observation? The observation of behavior is at the center of all scientific inquiry in social and personality psychology. Although there are a wide variety of methods that researchers use when observing, the term “behavioral observation” generally refers to a researcher seeing and/or hearing, and then systematically recording, the behaviors of an individual or group of individuals within a particular social context of interest, such as the classroom, the playground, the peer group, the home, the clinic, or the workplace. Typically, individuals are observed for relatively brief periods of time, but often for multiple bouts. Sometimes observations are conducted “live.” More often, an audio or video recording is made (and sometimes transcribed into written form as well); observations are then conducted using one or more of these at the convenience of the researcher. During an observation, a researcher periodically summarizes the physical and/or verbal behaviors of the participants of interest into specific categories using a clearly defined system of “codes” that are assigned based

BEHAVIORAL OBSERVATION AND CODING

multiple informants into assessment batteries is relatively easy—various forms of self-report questionnaires on behaviors or behavioral patterns are readily available, or can be created relatively easily, for different reporters (e.g., parents, teachers, youth). Although behavioral observation is a quite appealing method for some researchers, it does have its downsides. Even if an existing coding system is identified for a new study, purchasing the necessary equipment, securing private coding space, and assembling and training a team of observers (i.e., a “coding team”) can be time consuming and expensive. Once a team is ready, collecting data in vivo , or collecting and storing video or audio records and transcribing those records, and then managing and analyzing the resulting data, can also be quite costly. Furthermore, although the focus of a typical coding team is usually on obtaining and maintaining interrater agreement (i.e., independent observers applying the same codes to a given stream of behavior), this is no guarantee that behavioral observation will generate reliable (i.e., stable) or valid (i.e., “true” measures) scores of constructs of interest in a given sample. Indicators derived from behavioral observation often are weakly correlated with self-report measures of the same constructs, and the meaning of this may be unclear. Finally, the existence of audio or video records creates ongoing human subjects issues related to the protection of confidentiality and anonymity. In short, despite their appeal, “observational data, compared with other forms of data, are unwieldy and messy” (Margolin et al., 1998; p. 29). Nevertheless, behavioral observation has been employed frequently over the past 50 years, particularly among psychologists interested in interpersonal and intergroup relations, human development, and close relationships. Observational Settings Observational settings exist along a continuum of researcher influence ranging from

BEHAVIORAL OBSERVATION AND CODING

unfettered natural environments to tightly controlled experimental situations. Purely naturalistic situations have the advantage of being high in ecological validity (see Cialdini & Levy, ch. x). Although researchers observing behavior in its natural environment still need to establish the reliability of their observations (e.g., consistency across observers, episodes, or settings), the real world generalizability of such observations is self-evident. The more the researcher intervenes in the setting to be observed, the more has to be done to demonstrate that the setting produces externally valid results. In the sections below, we provide an overview of different degrees of researcher interventions into settings. As with any research tool, the validity of behavioral observation is situation-dependent and can only be inferred from that tested, narrow use; it is not “proven” for all time (e.g., Haynes & O’Brien, 2000). Thus, behavioral observation cannot be said to be a valid assessment approach any more than questionnaires can be said to be a valid assessment approach. Naturalistic Observation Naturalistic observation has a long history in the study of animal (e.g., Lorenz, 1970, 1971) and human (e.g., Mead, 1928) behavior. Some researchers who favor this type of observation use a qualitative approach, where the coding system is not predetermined. Others use a quantitative approach, marked by the use of preset codes and precisely defined rules for their assignment. One of the most important studies in social psychology—Festinger, Riecken, and Schacter’s (1956) When Prophecy Fails, which focused on social interactions within a doomsday cult and proposed cognitive dissonance theory—used naturalistic observation. Observers ultimately were not outsiders, but rather became members of the social group being observed. There were no predetermined codes to classify behaviors. The observers who infiltrated the cult

BEHAVIORAL OBSERVATION AND CODING

findings related to health, Mehl (2007, p. 370), drew the same conclusion as Festinger on the banality of observing life naturalistically: “One of my first ‘aha!’ experiences when we started doing EAR research was how ordinary and mundane real life really is. The sound files we obtained from participants first and foremost documented that for most people most of real life is not thrilling, glittery, and extraordinary.” Another recent use of naturalistic observation was of families of dual-earning parents in California (Campos, Graesch, Repetti, Bradbury, & Ochs, 2009). Because the investigators had an overwhelming 35 hours of video from two weekdays per family, Campos et al. (2009) focused only on the two minutes captured when the partners reunited after their workdays and coded these simply (i.e., positive, negative, ignoring/distracted, reporting information, checking in about logistics). The authors also presented data from the “scan sampling” of family interactions, in which, every 10 minutes, observers noted the location of each family member. They found that working couples spend almost no time together without children. In later analyses, they found that men’s, but women’s, “neuroticism” (i.e., temperamental negativity) moderated the relationship between job stress and at-home behavior (Wang, Repetti, & Campos, 2011). For instance, men high in job stress but low in neuroticism were more socially withdrawn during their first hour home, but their interactions with their children were more intense. Quasi-Naturalistic Observation As implied by the Festinger and Mehl quotes, naturalistic observation often requires so much time that it is inefficient and impractical. Thus, observation typically occurs in situations that are not completely natural and uninfluenced by the investigator. When investigators use quasi-naturalistic observations, the generalizability of behavior is of the highest concern and investigators attempt to influence the situation as little as possible.

BEHAVIORAL OBSERVATION AND CODING

The work of the Oregon Social Learning Center (OSLC) research team (e.g., Reid et al.,

  1. is a model of the development and refinement of a quasi-naturalistic observational paradigm. Starting in the late 1960s, OSLC researchers wanted to conduct naturalistic behavior observations of families but quickly learned that the natural world was not conducive to cost-effective data collection (Patterson, 1982). Family members typically disappeared or sat transfixed in front of a video screen when observers arrived (and this was usually a solitary television screen, long before the advent of other screen-related distractions in the home, such as smart phones, iPads, computers, video games, etc.). Out of necessity, eight rules (see Table 1) were imposed on families during their in-home observation sessions. Patterson (1982) noted that the rules transformed the otherwise typical environment into something close to, but not identical to, the real world (i.e., those being observed were unnaturally constrained but otherwise acting naturally in their natural environment). This increases the quality of the data collected by increasing interaction but reduces generalizability slightly, exactly the kind of trade off that all researchers must weigh in designing protocols. OSLC developed its quasi-naturalistic paradigm through trial and error, guided by both the empirical literature and by their theoretical model. The researchers were most interested in children’s aversive and aggressive behaviors and their parents’ responses to these behaviors. To increase the chance of observing such interactions, dinnertime was chosen as the setting to observe because earlier studies had found that mothers reported the most conflict with their children during the time surrounding meals (e.g., Goodenough, 1931). The further limitation of distractions increased the likelihood that the observational sessions would generate enough conflict behavior for hypothesis testing. Next, they tested observer influences on the data to identify whether any adjustments to their protocol were needed (e.g., they examined if

BEHAVIORAL OBSERVATION AND CODING

Table 2. Other approaches include providing couples with standardized topics to role play (e.g., planning a vacation) that may not relate to their own conflicts (e.g., Aron, Norman, Aron, McKenna, & Heyman, 2000) or having them reenact prior conflicts (e.g., Margolin, Burman, & John, 1989). Other researchers have set up situations to observe couples providing social support (e.g., Pasch & Bradbury, 1998), sharing exciting activities (e.g., Aron et al., 2000), or discussing situations of high import (e.g., Schmaling, Wamboldt, Telford, Newman, Hops, & Eddy, 1996). Perhaps surprisingly, asking couples to engage in communication about conflictual topics while researchers watch tends to elicit behavior with reasonable external validity. First, observed conflict behaviors in home and laboratory settings tend to be similar, although lab conflicts are a bit less negative (e.g., Gottman, 1979; Gottman & Krokoff, 1989). Second, couples judge in-lab behavior as typical of what they do at home (Foster et al., 1997). Third, partners’ reactivity and self-consciousness while being observed are relatively low (Christensen & Hazzard, 1983; Jacob, Tennenbaum, Seilhamer, Bargiel, & Sharon, 1994). Thus, even if in-lab “conflicts on command” are not quite as negative as they are at home, they still reveal detectable differences in affect, behavior, physiology, and interactional patterns and processes (e.g., Gottman, 1979, 1994, 1999). Experimental Manipulation Social psychologists study behavior within controlled laboratory settings to (a) observe behaviors that are not likely to be observed in unstructured settings and/or (b) to experimentally manipulate the causes of those behaviors. By controlling all aspects of a laboratory environment except that which is being manipulated, psychologists are able isolate particular behaviors of interest and make conclusions about the case of behaviors—an integral step to theory development (see Smith, ch. 3). In addition, often in naturalistic settings there are multiple causes of behaviors that are interdependent, making it difficult to isolate which of several factors

BEHAVIORAL OBSERVATION AND CODING

actually cause the behavior. With experimental manipulation, researchers can tease apart these causes by systematically manipulating them. There are several issues to consider when designing an experiment in which the goal is to change behavior. Whether the manipulation is minimal or large and the degree to which behaviors are “difficult or easy to influence” are important considerations (Prentice & Miller, 1992; p. 162), and they are certainly relevant for studies that intend to influence the display of dynamic, interpersonal behaviors. Minimal manipulations that have large effects on behaviors can be particularly convincing in demonstrating the strength and size of an effect. The mere exposure effect and the minimal group paradigm are classic examples of minimal manipulations that produce large effects on behavior. As a more recent example, Goff, Steele, and Davies (2008) demonstrated that White participants who were led to believe that they would discuss racial profiling with an African American participant placed their chairs farther apart from their partners’ chairs than did Whites who were led to believe that they would discuss a race-neutral topic. Goff et al.’s (2008) manipulation is minimal because the mere belief that participants would have a race-based discussion was sufficient to alter behavior. It is also important to consider the behaviors that are manipulated and measured. It is provocative to demonstrate that an experimental manipulation affects behaviors that are “difficult to influence” (Prentice & Miller, 1992, p. 162), largely because easy to influence behaviors are mundane (e.g., ask participants to sit when they arrive and they sit is of little interest). Rapport building within cross-race interactions and conformity to groups (e.g., Asch, 1951) are examples of difficult to influence behaviors. Manipulations that are both minimal in nature and exert effects on such behaviors are often deemed particularly impressive by social psychologists, and are therefore more likely to make a scientific impact.

BEHAVIORAL OBSERVATION AND CODING

on her face—which was painted on using make-up—or no birthmark. The presence of the birthmark was the experimental manipulation. Third, confederates allow for a clean standardization of the dependent behavior of interest. In Lakin et al. (2008), for example, mimicry was clearly (and simply) defined as foot-wiggling. Fourth, because confederates offer a level of experimental control that participant interaction partners do not, they allow researchers to isolate the causes of behavior to gain a better understanding of social processes. There are a few important steps that must be taken when using confederates as social interaction partners. First, it is important to make sure that confederates are not a hidden source of variance. For pragmatic purposes, researchers often use two or more confederates in a study. These confederates might not always behave consistently with each other, so one must make sure that the effect of the experimental manipulation does not depend on with which confederate participants interact. One potential method for addressing this issue is to treat confederate (e.g., Amy the confederate versus Stacy the confederate) as a predictor of the dependent behavior of interest and as a moderator of the effect of condition on that behavior (making sure that the confederate is crossed with condition). Confederate can also be considered a source of variance in the analysis. Kenny, Mohr, and Levesque (2001) discuss methods for examining reliability of observers’ judgments of participants’ behaviors, many of which are applicable to studies that use confederates. For example, they discuss the importance of treating the observer as a source of variance— a method that can be easily adapted to treating the confederate as source of variance. Second, whenever possible, confederates should be blind to condition so that their behaviors are not inadvertently influenced. For example, Mendes et al. (2008) went to great lengths to ensure that the confederate did not know whether she had a birthmark painted on her face. Third, confederates may be trained to behave in a certain, consistent way across

BEHAVIORAL OBSERVATION AND CODING

participants, but they might engage in automatic behaviors that are outside of their awareness, especially during social interactions, and these behaviors could influence the interaction. To make sure that confederates behave consistently across participants and across conditions, researchers should record the behaviors of confederates within each interaction if possible; for example, by videotaping them and then coding their behaviors. Sometimes confederates are used because they represent groups that are difficult to recruit to participate in research, either because they are not part of a convenience sample, or because they are a small percentage of the sample population. In these cases, confederates serve a pragmatic purpose, even when the question of interest is interpersonal. For example, many cross-race interaction studies conducted in the United States have recruited White participants who then interact with African American confederates. Although such a strategy allows the examination of cross-race encounters within the lab, this strategy limits the understanding of cross-race interactions from the African American perspective (Shelton & Richeson, 2006). As such, theories about the nature of cross-race interactions have become “one-sided” in that there is much cumulative knowledge about the attitudes and behaviors of Whites but much less knowledge about the attitudes and behaviors of African Americans. This is just one example of how the use of confederates can have direct, and potentially profound, theoretical implications. Behavioral Observation Coding Systems Behavioral observation coding systems tend to be one of two types. Topographical coding systems measure the occurrence of behaviors. Dimensional coding systems measure the intensity of behaviors along a dimension (e.g., warmth, engagement). The choice of a coding system depends on the specific purposes of a study. Because of the costs involved in launching a behavioral observation enterprise, a system should be only as complicated as is necessary to

BEHAVIORAL OBSERVATION AND CODING

revised into the Marital Interaction Coding System (Weiss & Summers, 1983), which was developed for coding couples problem solving interactions in a laboratory setting. Like Latin, these two “dead coding languages” are the source for dozens of offshoots (Kerig & Baucom, 2004; Kerig & Lindahl, 2000). Thus, a first step in developing a new system is to try to find a past system that is closest to what is needed and revise from there. The advantage of using an existing coding system (or a close derivative of one) is that much psychometric work on reliability, interrater agreement, and validity has already been conducted. The disadvantage, as just noted, is that existing coding systems might not be a good match for one’s hypotheses. Coding Units The most fundamental property of a coding system is the sampling strategy for behavior, otherwise known as the “coding unit.” Coding units divide an observation into discrete segments, and each segment has the opportunity to be assigned a code, should one apply. The major sampling strategies employed in behavioral observation (see Table 3) are event, duration, interval, and time. Each strategy yields a different type of coding unit. Advantages and disadvantages of each of these strategies are discussed in Bakeman and Gottman (1997) and Haynes and O’Brien (2000). With each strategy, data richness and quality (e.g., retaining the sequential unfolding of events, reliability, validity) must be weighed against practical issues (e.g., expense, time, availability or practicality of recording devices, difficulty obtaining reliability). As noted by Margolin et al. (1998), even when a coding unit is presumably clear, technical issues, such as the quality of the audio track on a video recording or the speed of turn taking in an interaction, can make detecting some units difficult. This is one reason why researchers interested in verbal communication often create written transcriptions that are used together with audio and video feeds when coding.

BEHAVIORAL OBSERVATION AND CODING

Molar Versus Molecular Approach Another key property of a coding system is how often codes are recorded. In molar, or “global,” coding systems (e.g., Rapid Couples Interaction Scoring System; Krokoff, Gottman, & Haas, 1989) summary ratings are made for each code over a large number of potential coding units (e.g., every three minutes in a 15 minute observation, or once at the end of the observation). Codes tend to be few, representing behavioral classes (e.g., negativity, attentiveness, escalation, reciprocation). Thus, numerous examples of the codes of interest may occur within multiple potential coding units, but only one summary score is given, usually indicating the frequency with which a code appeared throughout the observation period. In contrast, molecular, or “microbehavioral,” systems code behavior as it unfolds over time and tend to have many fine-grained codes (e.g., eye contact, criticize, whine, withdraw) that are given within each coding unit. The large number of codes in many microbehavioral systems may make them inefficient to use, even with highly trained coders. This is because (a) coders can almost never get or maintain adequate inter-rater agreement on such a large number or codes; and (b) the codes occur too infrequently in a limited observational period to make them all useful even if they were reliably coded. Thus, researchers often resort to grouping codes, often condensing down a large system into positive, negative, and neutral classes for analytic purposes (see review in Heyman, 2001). Imagine spending the extreme time and expense required to train coders on 40 codes, only to end up only analyzing positive, negative, and neutral! Microbehavioral systems tend to be topographical; global systems can be either topographical or dimensional, though dimensional coding, especially on a behavior-by-behavior basis, is less common. Given that many theoretical models of interest have implicit or explicit intensity X time predictions (e.g., Patterson’s [1982] Coercive Family Process model posits that

BEHAVIORAL OBSERVATION AND CODING

have been paired, usually by asking coders to make global impressions ratings after coding microanalytically (e.g., Patterson, Reid, & Dishion, 1992). Multiple Dimensions Another property of a coding system is how many different dimensions of an interaction are coded, and how many different codes are included within each dimension. For example, some coding systems record information about the general context within which a behavior is occurring (e.g., in a system focused on child behavior at school, the location of an interaction, such as on the playground, in the lunchroom, or in the classroom), as well as the specific behaviors of interest. Other systems might also include a code describing the quality of the behavior, such as whether it was delivered with negative, positive, or neutral affect. The choice of how many dimensions to code depends on the specific hypothesis of interest, but issues can get confused in no small part because of the high cost of conducting observational work. Once data have been collected and a team has been assembled, it may seem appealing to collect as much information as possible while coding so that a variety of tasks can be accomplished, from hypothesis testing to hypothesis development. The most obvious risk in such an approach is increased difficulty in reaching an acceptable level of interrater agreement, but it may overly burden coders and compromise even more important qualities, such as the reliability and/or validity of the observation. This can only be known if other types of data (from multiple sessions, from multiple informants, through multiple methods) are collected to aid in understanding the observational data that is collected. Example An example of a mature coding system is the Interpersonal Process Code (IPC; Rusby, Estes, & Dishion, 1991), a distant tributary of the aforementioned Family Interaction Coding

BEHAVIORAL OBSERVATION AND CODING

System. In the IPC, a target individual is chosen as the focus of an observation, and everything that individual does, and has done to him/her, is coded. The coding unit is a codeable behavior, which can continue even when the behaviors of others are also taking place (e.g., a target child starts to hum and continues to hum, even though the child he is playing with is yelling at him). When no codeable behavior is occurring, a Stop code is entered. When an individual cannot be fully heard or seen, an Out of View code is given. The IPC is coded on a handheld or stationary computer in real time and has been used to code both live and videotaped sessions. Three dimensions are coded simultaneously in the IPC: Activity, Content, and Valence. Activity refers to the general context within which a social interaction is taking place and varies depending on the study. An example of Activity codes used in prior studies are Work, Play, Read, Eat, Attend, or Unspecified. Activity codes are given in priority so that if a code with theoretically higher priority occurs, it is given (e.g., Work trumps Play, Play trumps Read, etc.). Content refers to specific behaviors of interest. Thirteen Content codes constitute the IPC, and include positive, neutral and negative verbal, non-verbal, and physical codes. For example, the code Positive Interpersonal is assigned to “verbal expressions of approval of another’s behavior, appearance or state” (p. 17). Valence refers to the emotion tone accompanying the delivery of content (i.e., Happy, Caring, Neutral, Distress, Aversive, and Sad). In addition, who displayed the behavior (the Initiator), and whom the behavior was directed toward (the Recipient), are also coded. Training Observers The careful training of observers (i.e., “coders”) is essential to behavioral observation. People who may have very different perceptions of behavior must, through the training process, come to be interchangeable with one another. Moreover, they must maintain consistency over