Download Role of Knowledge in Discourse Comprehension: Understanding Word Meanings in Context and more Study notes Psychology in PDF only on Docsity! Psychological Review Colayright 1988 by the American Psychological Association, Inc. 1988, Vol. 95, No. 2, 163-182 0033-295X/88/$00.75 The Role of Knowledge in Discourse Comprehension: A Construction-Integration Model Walter Kintsch University of Colorado In contrast to expectation-based, predictive views of discourse comprehension, a model is developed in which the initial processing is strictly bottom-up. Word meanings are activated, propositions are formed, and inferences and elaborations are produced without regard to the discourse context. However, a network of interrelated items is created in this manner, which can be integrated into a coherent structure through a spreading activation process. Data concerning the time course of word identification in a discourse context are examined. A simulation of arithmetic word-problem under- standing provides a plausible account for some well-known phenomena in this area. Discourse comprehension, from the viewpoint of a computa- tional theory, involves constructing a representation of a dis- course upon which various computations can be performed, the outcomes of which are commonly taken as evidence for com- prehension. Thus, after comprehending a text, one might rea- sonably expect to be able to answer questions about it, recall or summarize it, verify statements about it, paraphrase it, and SO o n . To achieve these goals, current theories use representations with several mutually constraining layers. Thus, there is typi- cally a linguistic level of representation, conceptual levels to represent both the local and global meaning and structure of a text (e.g., the micro- and macrostructure, constituting the text base in van Dijk & Kintsch, 1983), and a level at which the text itself has lost its individuality and its information content has become integrated into some larger structure (e.g., van Dijk & Kintsch's situation model). Many different processes are involved in constructing these representations. To mention just a few, there is word identifi- cation, where, say, a written word like bank must somehow provide access to what we know about banks, money, and overdrafts. There is a parser that turns phrases like the old men and women into propositions such as AND[OLD[MENI,OLD [WOMEN]]. There is an inference mechanism that concludes from the phrase The hikers saw the bear that they were scared. There are macro-operators that extract the gist of a passage. There are processes that generate spatial imagery from a verbal description of a place. It is one thing for a theorist to provide some formal descrip- tion (e.g., a simulation model) for how such processes can occur and for what the computational steps were that led to a particu- lar word identification, inference, or situation model. It is quite This research was supported by Grant MH 15872 from the National Institute of Mental Health. The work on word arithmetic problems was supported by Grant BNS 8741 from the National Science Foundation. Correspondence concerning this article should be addressed to Walter Kintsch, Department of Psychology, Univeristy of Colorado, Boulder, Colorado 80309-0345. another to control construction processes in such a way that at each point in the process exactly the right step is taken. Part of the problem has to do with the characteristic ambiguity of language: How do we make sure that we access the financial meaning of bank, and not the meaning of riverbank? Why did we parse the old men and women as we did--maybe the women were not old at all. Why did we infer that the hikers were scared rather than that they had their eyes open, or a myriad of other irrelevancies? Of all the many ways macro-operators could be applied, how did we get just the right sequence to reach a plausi- ble gist without making the wrong generalizations? The number of possible alternative steps is distressingly large in constructing discourse representations, and without firm guidance, a com- putational model could not function properly for long. That is where knowledge comes in. General knowledge about words, syntax, the world, spatial relations--in short, general knowledge about anything--con- strains the construction of discourse representations at all lev- els. Indeed, this is what makes it possible to construct these rep- resentations. There is a striking unanimity among current theo- ries about how this is done. Our conceptions about knowledge use in discourse compre- hension are dominated by the notions of top-down effects and expectation-driven processing. Knowledge provides part of the context within which a discourse is interpreted. The context is thought of as a kind of filter through which people perceive the world. At the level of word recognition and parsing, it lets through only the appropriate meaning of an ambiguous word or phrase and suppresses the inappropriate one. Through se- mantic priming, the feature counter of the logogen for bank as afinancial institution will be incremented and will reach its threshold before that of riverbank in the right context (Morton, 1969). Parsing a sentence is often thought of as predicting each successive constituent from those already analyzed on the basis of syntactic rules (Winograd, 1983). Scripts, frames, and sche- mata constrain the inferences an understander makes (as in Schank & Abelson, 1977), thereby preventing the process from being swamped in a flood of irrelevancies and redundancies. Arithmetic strategies generate just the right hypothesis in solv- ing a word problem and preclude the wrong ones (Kintsch & 163 164 WALTER KINTSCH Greeno, 1985). In a word, knowledge makes understanding pro- cesses smart: It keeps them on the right track and avoids explor- ing blind alleys. People understand correctly because they sort of know what is going to come. This program of research is well expressed by the following quotation from Schank (1978, p. 94), which served as a motto for Sharkey's (1986) model of text comprehension: We would claim that in natural language understanding, a simple rule is followed. Analysis proceeds in a top-down predictive man- ner. Understanding is expectation based. It is only when the expec- tations are useless or wrong that bottom-up processing begins. Empirically, this position is questionable: Even fluent readers densely sample the words of a text, as indicated by their eye fixations (Just & Carpenter, 1980), making the bottom-up mode appear the rule rather than the exception. ComputationaUy, it is not an easy idea to make work. It is difficult to make a system smart enough so that it will make the right decisions, yet keep it flexible enough so that it will perform well in a broad range of situations. On the one hand, one needs to make sure that exactly the right thing (word meaning, proposition, inference) will be constructed; for that purpose one needs powerful, smart rules that react sensitively to subtle cues. On the other hand, humans comprehend well in ever-changing contexts and adapt easily to new and unforeseen situations; for that purpose one needs robust and general construction rules. Scripts and frames, as they were first conceived, are simply not workable: If they are powerful enough, they are too inflexible, and if they are general enough, they fail in their constraining function. This dilemma has long been recognized (e.g., Schank, 1982; van Dijk & Kintsch, 1983), and efforts have been undertaken to make expectation-driven processes sufficiently flexible (e.g., Schank's memory organization packets, or MOPs). In this article, an al- ternative solution to this problem will be explored. Cons t ruc t ion o f Discourse Representat ions The traditional approach to modeling knowledge use in com- prehension has been to design powerful rules to ensure that the right elements are generated in the right context. The problem is that it is very difl~cult to design a production system powerful enough to yield the right results but flexible enough to work in an environment characterized by almost infinite variability. The approach taken here is to design a much weaker production system that generates a whole set of elements. These rules need to be just powerful enough so that the right element is likely to be among those generated, even though others will also be generated that are irrelevant or outright inappropriate. An inte- gration process will then be used to strengthen the contextually appropriate elements and inhibit unrelated and inappropriate ones. Weak productions can operate in many different contexts because they do not have to yield precise outputs; on the other hand, a context-sensitive integration process is then required to select among the outputs generated. The integration phase is the price the model pays for the necessary flexibility in the con- struction process. The model proposed here has been termed a construction- integration model to emphasize its most salient feature. It com- bines a construction process in which a text base is constructed from the linguistic input as well as from the comprehender's knowledge base, with an integration phase, in which this text base is integrated into a coherent whole. The knowledge base is conceptualized as an associative network. The construction process is modeled as a production system. Indeed, it is a gener- alization of the production system used in earlier work, such as the simulation-of-comprehension processes developed by Flet- cher (1985) and Dellarosa (1986) after the model of Kintsch and Greeno (1985). The main difference is that instead of precise inference rules, sloppy ones are used, resulting in an incoherent, potentially contradictory output. However, this output struc- ture is itself in the form of an associative net, which can be shaped into a coherent text base via relaxation procedures in the connectionist manner (e.g., Rumelhart & McClelland, 1986). Thus, the model represents a symbiosis of production systems and connectionist approaches.l Certain limitations of the present article are worth noting at this point, for it does not offer a solution to all the problems in discourse understanding. Thus, it is not primarily concerned with the specific strategies (or rules) for the construction of text propositions or inferencing. Instead, it relies in this respect on what is available in the literature as well as on whatever future researchers will be able to come up with. The only point it makes is that whatever these strategies or rules are, they will be easier to formulate within the present framework, which allows them to be both weaker and more general. Thus, one need not worry about constructing just the right inference, but can be content with a much sloppier rule. Sometimes, of course, even the latter type of rule may be hard to come by, whereas in other cases (e.g., in the word problems discussed later) promiscuous hypothesis generation is straightforward (while selecting just the right one can be tricky). Knowledge Representation The process of constructing a discourse representation relies heavily on knowledge. To understand how it operates, one must first have an idea of how the to-be-used knowledge is organized. Typically, theorists have tried to create knowledge structures to support smart processes: semantic nets, frames, scripts, and schemata. As has been argued elsewhere (Kintsch, in press), such fixed structures are too inflexible and cannot adapt readily enough to the demands imposed by the ever-changing context of the environment. Instead, a minimally organized knowledge system is assumed here in which structure is not prestored, but generated in the context of the task for which it is needed. An associative net with positive as well as negative interconnections serves this purpose. Knowledge is represented as an associative net, the nodes of Conceivably, a purer conneetionist model might be constructed. In the present model, an associative knowledge net is used to build a text- base net, which is then integrated. McClelland (1985) has put forth the idea of a connection information distributor, which is a subnetwork in which the units are not dedicated and connections are not hardwired. Instead, this subnetwork is programmable by inputs from the central network where the knowledge that controls processing in the subnet- work is stored. One could say that the production rules in the present model have the function of programming such a subnetwork. ROLE OF KNOWLEDGE IN DISCOURSE COMPREHENSION 167 On each retrieval attempt, an item among the associates of i is selected according to Equation 1. A sampling-with-replace- ment process is assumed so that dominant associates may be retrieved more than once. The number of retrieval attempts with item i as the cue is assumed to be fixed and is a parameter of the model, k. In the examples that follow, k was chosen to be 2 or 3, mostly to reduce the complexity of these examples. However, one may speculate that the most realistic value of k would not be much higher, perhaps between 5 and 7. Consider some simple examples. 1. Suppose the word bank is presented as part of a text. It will activate the lexical nodes BANKI (financial institution) as well as BANK2 (riverbank), plus some of their associates; for ex- ample, the construction process might pick from Figure 1: BANKI, MONEY, FIRST-NATIONAL-BANK, BANK2, RIVER, OVER- FLOW[RIVER,BANK2]. 2. Suppose the sentence Lucy persuaded Mary to bake a cake is presented as part of a text. The parser should provide a phrase structure tree as output, from which the proposition PER- SUADE[LUCY,MARY,BAKE[MARY,CAKEI] is constructed. Each text proposition activates propositions closely related to it in the general knowledge net, regardless of the discourse context. For instance, in the case of BAKEIMARY,CAKE] we might thus obtain LIKE[MARY,EAT[MARY,CAKE]], PUT[MARY,CAKE,IN-OVEN], RE- SULT[BAKE[MARY,CAKEI,HOT[CAKE]], PREPARE[MARY,DINNER]. These propositions are all closely associated with baking a cake (Figure 2). Note, however, that elaborating the text base in this way is not just a question of retrieving associated propositions from the knowledge net. The arguments of these retrieved prop- ositions must be treated as variables that are to be bound to the values specified by the retrieval cue. Thus, because MARY is the agent of the text proposition, MARY is made the agent in the knowledge propositions it brings into the text representation, instead of PERSON in Figure 2. Similarly, although the informal- ity of the present notation hides this, CAKE now is the particular one MARY bakes, not the generic one in Figure 2. These knowl- edge propositions function as potential inferences. Out of con- text there is no way of determining which of them are relevant: Maybe Mary really likes to eat cake, but perhaps she is in the process of cooking dinner, in which case PREPARE [MARY,DINNER] might become a macroproposition (what van Dijk, 1980, calls a construction). But it is also possible that next she will burn her fingers when she takes the cake out of the oven, making HOT, which plays no role at all in the other contexts, the relevant inference. At this point, the construction process lacks guidance and intelligence; it simply produces potential infer- ences, in the hope that some of them might turn out to be useful. 3. In the third example, if the proposition SEND[LAWYER,DE- FENDANT,PRISON] has been formed, the knowledge net contri- butes nothing, because one presumably does not know any- thing about lawyers sending defendants to prison. (Of course, LAWYER, DEFENDANT, and PRISON would each be associatively elaborated separately.) If, however, JUDGE rather than LAWYER were the agent of SEND, the elaboration process would contrib- ute the information that this implies that the judge is sentencing the defendant and so forth. Step C in the construction process, the generation of addi- tional inferences, is necessary because not all inferences that are required for comprehension will, in general, be obtained by -.5 o MONEY ~ BANK10 .1 OBANK2 ~ .s # RIVER o... -.5 Figure 3. Connections between BANK1 and BANK2 and their associates. the random elaboration mechanism described earlier. In some cases more focused problem-solving activity is necessary to generate the desired inferences. Exactly how this is to be done is, however, beyond the scope of this article. I merely wish to point out here that in addition to the undirected elaboration which results from Step B of the construction process, there is still a need for controlled, specific inferences. Two types of such inferences are of particular importance in comprehension. Bridging inferences (Haviland & Clark, 1974; Kintsch, 1974) are necessary whenever the text base being constructed is inco- herent (i.e., whenever either the original text base itself or the elaborated text base remains incoherent by the criteria dis- cussed in van Dijk and Kintsch, 1983, chapter 5). Second, mac- ropropositions have to be inferred (as discussed in general terms in chapter 6 of van Dijk & Kintsch, 1983, and operationalized as a production system by Turner, McCutchen, & Kintsch, 1986). Macropropositions are also elaborated associatively, as described in Step B for micropropositions. What has been constructed so far is a set of propositions con- raining the (micro)propositions directly derived from the text, a randomly selected set of associates for each of these, the mac- ropropositions generated from the text, and their associates. The final Step D of the construction process involves the speci- fication of the interconnections between all of these elements. There are two ways in which elements are interconnected. (a) The propositions directly derived from the text (hence referred to as "text propositions") are positively interconnected with strength values proportional to their proximity in the text base. Specific realizations of this principle are described in the dis- cussion of Figure 4. (b) If propositions i andj are connected in the general knowledge net with the strength value s(i,g), -1 < s(i,j) < 1, and if i and j become members of a text base, the strength of their connection in the text base is s(i,j). In other words, propositions in the text base inherit their interconnec- tions from the general knowledge net. Strength values are addi- tive, up to a maximum of 1, in those cases in which an inherited strength value combines with a text-base-determined connec- tion. Consider, for instance, the portion of a network that is gener- ated when the word bank activates both BANK1 and BANK2, as well as the associations MONEY and RIVER. A possible pattern of connections is shown in Figure 3, where for simplicity, con- nection strengths have been limited to ---.5 or 1. Alternatively, the graph shown in Figure 3 can be expressed in matrix form as shown in Table 1. BANKI is associated with MONEY, BANK2 with RIVER, but inhibitory connections exist between MONEY and BANK2 and between RIVER and BANKI. An example of text propositions that are interconnected via their positions in the text base is shown in Figure 4. LUCY is connected most strongly to WEED[LUCY,GARDEN], and least 168 WALTER KINTSCH Table 1 Connectivity Matrix for the Graph Shown in Figure 3 Proposition 1 2 3 4 1. MONEY - - 0.5 --0.5 0.0 2. BANKI 0.5 - - -- 1.0 --0.5 3. BANK2 --0.5 -- 1.0 - - 0.5 4. RIVER 0.0 --0.5 0.5 - - Table 2 Connectivity Matrix for the Graph Shown in Figure 4 Proposit ion 1 2 3 4 1. LUCY - - 0.9 0.7 0.4 2. WEED 0.9 - - 0.9 0.7 3. GARDEN 0.7 0.9 - - 0.9 4. VEGETABLE 0.4 0.7 0.9 strongly to VEGETABLE[GARDEN]. Although there are many possible ways to assign numerical connection strengths to ex- press this pattern of connectivity, the one chosen here results in the matrix shown in Table 2. Inferences inherit positive and negative interconnections from the general knowledge net, as seen in Figure 5. The result of the construction process is, therefore, a network expressable as a connectivity matrix, consisting of all the lexical nodes ac- cessed, all the propositions that have been formed, plus all the inferences and elaborations that were made at both the local and global level and their interconnections. Integration The network that has been constructed so far is not yet a suit- able text representation. It was carelessly constructed and is therefore incoherent and inconsistent. At all levels of the repre- sentation, components associated with the text elements were included without regard to the discourse context, and many of them are inappropriate. An integration process in the connec- tionist manner can be used to exclude these unwanted elements from the text representation (e.g., see Rumelhart & McClel- land, 1986, and Waltz & Pollack, 1985, for discourse). Text comprehension is assumed to be organized in cycles, roughly corresponding to short sentences or phrases (for further detail, see Kintsch & van Dijk, 1978; Miller & Kintsch, 1980). In each cycle a new net is constructed, including whatever is carried over in the short-term buffer from the previous cycle. 6 Once the net is constructed, the integration process takes over: Activation is spread around until the system stabilizes. More specific, an activation vector representing the initial activation values of all nodes in the net is postmultiplied repeatedly with the connectivity matrix. After each multiplication the activa- tion values are renormalized: Negative values are set to zero, and each of the positive activation values is divided by the sum of all activation values, so that the total activation on each cycles remains at a value of one (e.g., Rumelhart & McClelland, 1986). Usually, the system finds a stable state fairly rapidly; if LUCY,~WEED[LUCY'~ N] GARDEN VEGETABLE[GARDEN] Figure 4. The text base for Lucy weeded the vegetable garden. the integration process fails, however, new constructions are added to the net, and integration is attempted again. Thus, there is a basic, automatic construction-plus-integration pro- cess that normally is sufficient for comprehension. This process is more like perception than problem solving, but when it fails, rather extensive problem-solving activity might be required to bring it back on track. These processes will not be considered further here. The result of the integration process is a new activation vec- tor, indicating high activation values for some of the nodes in the net and low or zero values for many others. The highly activated nodes constitute the discourse representation formed on each processing cycle. In principle, it includes information at many levels: lexical nodes, text propositions, knowledge-based elabo- rations (i.e., various types of inferences), as well as macroprop- ositions. A few simple examples will illustrate what is at issue here. Consider Lucy persuaded Mary to bake a cake, which was dis- cussed earlier. The PERSUADE proposition will pull in related knowledge items, just as was shown for BAKE. However, out of context the integration process will not yield any striking re- suits. In the context of Lucy made tomato soup and sauteed some porkchops with herbs. She set the table and persuaded Mary to bake a cake, the integration process has very different results: PREPARE[LUCY,DINNER] emerges as the dominant prop- osition (macroproposition) because most of the other proposi- tions in the text base contribute to its activation value. That the cake was hot, or that she put it into the oven, disappears from the representation with activation values around zero. Next, consider the example just discussed, where a perfectly good propositional strategy led to a wrong result. For The lin- guists knew the solution of the problem would not be easy, the text base that was constructed is shown in Figure 6. It corre- sponds to the connectivity matrix exhibited in Table 3 if con- nection strengths are assigned as in Table 2. (KNOW[SOLUTION] and NOT[EASYI are connected positively via KNow[S] but nega- tively via EASY, which adds up to 0.) The activation vector (.25, 6 That integration occurs at the end of each processing cycle is pro- posed here merely as a simplifying assumption. Although there is clearly something going on at the end of sentences (e.g., Aaronson & Scarbor- ough, 1977), integration does not need to wait for a sentence boundary (see the evidence for the "immediacy assumption"; Just & Carpenter, 1980; Sanford & Garrod, 1981). It would be quite possible to apply the relaxation procedure outlined here repeatedly in each cycle, as proposi- tions are being constructed. This would allow for the disambiguation of word senses before the end of a cycle. Because inferences and macro- propositions are usually not available before the end of a processing cycle, end-of-cycle integration plays an especially important role. ROLE OF KNOWLEDGE IN DISCOURSE COMPREHENSION 169 Figure 5. Inferences generated from WEEDILUCY, GARDEN] and their interconnections. Table 3 Connectivity Matrix for the Graph Shown in Figure 6 Proposition 1 2 3 4 1. KNOW[S] - - 0.9 0.7 0.9 2. KNOW[SOL] 0.9 - - -1.0 0.0 3. EASY 0.7 -1.0 - - 0.9 4. NOT 0.9 0.0 0.91 - - .25,.25,.25) corresponding to the assumption that all text prop- ositions are equally activated initially is repeatedly multiplied with this matrix, renormalizing the obtained activation values after each multiplication as described earlier. To decide when the activation vector has stabilized, the following criterion was established: A stable state is reached when the average change in the activation values after a multiplication is less than .001. Although this is an arbitrary criterion, even large changes (by one order of magnitude in either direction) make only minor differences in the final activation values obtained in this and many other cases. In the present case, this criterion is reached after 10 operations, yielding the final activation vector (.325, .000, .325, .350)--that is, the wrong KNOW[LINGUISTS,SOLU- TIONI, which does not fit into the text base, has been deacti- vated. The integration process similarly resolves the problem of multiple pronoun referents. For The lawyer discussed the case with the judge. He said '7 shall send the defendant to prison," propositions were constructed for both lawyer and judge as referents of he. However, the process of associative elaboration generated some additional information for SEND[JUDGE,DEFENDANT,PRISON], but not for SEND[LAW- YER,DEFENDANT,PRISON]. The resulting text base is shown in Figure 7. To obtain the corresponding connectivity matrix (see Table 4), connection strengths among text base propositions were assigned as in Table 2, and among associates as in Table 3 (other assignments result in different numerical values for the final activation vector, but its pattern remains the same as long as the essential features of the matrix are preserved--for exam- ple, which connections are positive, negative, and zero). As- sume an initial activation vector of (.25, .25, .25, .25, .25, 0, 0), reflecting the fact that only the text propositions themselves are activated initially. After 19 multiplications with the connectiv- ity matrix, the two propositions in which he had been identified as the lawyer have activation values of 0, whereas the corre- sponding judge propositions have activation values of.261 and .283, respectively. Just a little knowledge was enough to choose the correct referent. After this general description of the construction-plus-activa- tion model, two specific applications will be discussed in more detail: how words are identified in a discourse context, and how a propositional text base and situation model are constructed when comprehension depends heavily on activating a rich knowledge set. For that purpose, arithmetic word problems were chosen as the example, because the knowledge that needs to be activated is particularly well defined in that domain, and unambiguous criteria of understanding exist--a solution is ei- ther right or wrong. The purpose of these examples is twofold: to show how the general framework proposed can be elaborated into specific models in these experimental situations, and to compare the performance of these models with empirical obser- vations and experimental results as a first test of the psychologi- cal adequacy of these models. Word Identification in Discourse The first problem to be considered in detail is how knowledge is used in understanding the meaning of words in a discourse. The previously sketched model implies that word meanings have to be created anew in each context, that this is initially strictly a bottom-up process with context having its effects in the integration phase, and that this construction-plus-integra- tion process takes time, with different factors influencing suc- cessive phases of the process. Context effects in word recognition are ubiquitous in the ex- perimental literature, and the explanation of these context effects has been a primary goal of theories of word recognition. Typically, it is taken for granted in these theories that because Figure 6. The strategic construction of a text base: SOLUTION-OF-THE- PROBLEM is first assigned to KNOW, then to EASY. (The dollar sign is a placeholder.) Figure 7. The strategic construction of a text base: The pronoun he is identified with two potential, mutually exclusive referents. (Instead of writing out whole propositions, the abbreviation [.] is used for the argu- ments of a proposition when they can be readily inferred.) 172 WALTER KINTSCH Figure 8. Context effects as indexed by the reaction time difference to context inappropriate and appropriate associates or inferences as a function of processing time, after Till, Mross, and Kintsch ( ! 988). context of each word by itself. This stage of sense activation, however, is quickly followed by a process of sense selection in which the discourse context becomes effective: By 500 ms, con- text-inappropriate associates are deactivated (see also Seiden- berg et al., 1982, and Swinney, 1979). If given more time, con- text effects grow even stronger: By 1,000 ms, contextually ap- propriate inference words are strongly and reliably primed even in the absence of associative connections (similarly for recogni- tion, see McKoon & Rateliff, 1986). Clearly, this pattern of results is in excellent agreement quali- tatively with the model of knowledge use in discourse presented earlier. Right after a word is perceived, it activates its whole asso- ciative neighborhood in a context-independent way, with the consequence that strong associates of a word are likely to be represented in working memory and hence will be primed in a lexical decision task, whether they are context appropriate or not. The knowledge-integration process then results in the de- activation of material that does not fit into the overall discourse context (such as context-inappropriate associates). Note that in order to disambiguate words on-line, the integration phase cannot be delayed until the end of a processing cycle; word senses are disambiguated before that. In the model, therefore, as soon as a text proposition is constructed and its associates have been generated, they will be integrated into whatever con- text exists at that time in working memory. Thus, each process- ing cycle involves many integrations, and the single integration operation performed at the end of each cycle in many of the examples discussed here is merely a simplification, adopted whenever one is not concerned with the on-line generation of word meanings. Finally, contextual inferences should require the most time to become activated on the average because al- though they sometimes result from the initial knowledge sam- piing, in other cases repeated sampling or, further, strategic elaboration might be required. Earlier, an example was given of one of the texts used in the Till et al. (in press) study. The predictions of the model will be illustrated by means of this example. The aforementioned text (The townspeople were amazed to find that all the buildings had collapsed except the mint) has the following propositional repre- sentation: 1. TOWNSPEOPLE 2. AMAZED[TOWNSPEOPLE,P3] 3. COLLAPSE[P4] 4. ALL-BUT[BUILDING,MINT l 5. BUILDING 6. MINT Connection strengths of .9, .7, .4, and 0 were assigned to text propositions one, two, three, or more steps apart in the text base (e.g., P1 is two steps away from P3, connected via P2). Next, each text proposition was allowed to access at random two of its neighbors in the long-term associative net. This process was simulated by having an informant provide free associations to phrases based on each of these six propositions. For instance, the phrase all buildlings but the mint elicited the associations many buildings and mint is a building. Of course, MONEY and CANDY were chosen as the associates of MINT. Each text propo- sition was connected by a value of .5 to its associates, yielding an 18 • 18 connectivity matrix. Activation was then allowed to spread from the text propositions to the knowledge elabora- tions. Specifically, an initial activation vector with 1/6's corre- sponding to the text propositions and zeros otherwise was multi- plied with the connectivity matrix until the pattern of activation stabilized. As a result, text propositions achieved activation val- ues between .0987 and . 1612, depending on how closely they ROLE OF KNOWLEDGE IN DISCOURSE COMPREHENSION 173 were tied into the text base, and the knowledge elaborations had much lower activation values, between .0142 and .0239, with both MONEY and CANDY having a value of.0186. Thus, at this stage of processing, MONEY and CANDY are equally activated. Activation continues to spread, however, and differences be- gin to emerge among the activation values for the various knowledge elaborations that have been added to the text base. The reason for this is that the knowledge elaborations are con- nected not only to the text propositions that had pulled them into the net but also to other text propositions as well as to each other. To approximate these interrelations, a connection value of .5 was assigned to any two propositions sharing a common argument. Because the homophone mint contributed associa- tions to the subnet that refers to both of its senses, an inhibiting connection of - .5 was assigned to MINT/CANDY and BUILDING, whereas CANDY and MONEY themselves were connected by a - 1. Continued multiplication of the activation vector with this connectivity matrix yielded a stable pattern (average change < .001) after 11 operations. At this point text propositions had activation values ranging between. 1091 and .0584. Several of the knowledge elaborations reached values in this range, for ex- ample, .0742 for both ISA[MINT,BUILDING] and MONEY and .0708 for KILL/BUILDING,TOWNSPEOPLE], whereas others had faded away by this time; for example, MAN, which entered the subnet as an associate of TOWNSPEOPLE, had an activation value of.0070 and, most significantly, .0000 for CANDY. This stage of processing corresponds to the 400- and 500-ms points in Figure 8" MINT is now clearly embedded in its context as a kind of building, and the inappropriate association CANDY is no longer activated. The next processing stage involves the construction of a topi- cal inference--what is the sentence about? While the exact op- erations involved in the construction of such inferences are be- yond the scope of this article, van Dijk and Kintsch (1983, chapter 6) have discussed some of the mechanisms involved, such as a strategy of looking for causal explanations, which is what actual subjects appear to use predominantly in the follow- ing case. If given enough time, the modal response of human readers is that the sentence is about an earthquake that de- stroyed a town. Thus, the (empirically determined) proposi- tions EARTHQUAKE and CAUSE/EARTHQUAKE,P3] were added to the text base and connected with the text-base propositions from which they were derived by a value of .5. The two new propositions were given initial activation values of zero, and the integration process was resumed; that is, activation now spread from the previously stabilized subnet into the newly con- structed part of the net. Nine more integration cycles were re- quired before the expanded net stabilized. As one would expect, the two new inferences did not alter the pattern of activation much, but both of them became fairly strongly activated (thereby diminishing activation values in the already existing portion of the net). The topical inferences EARTHQUAKE and CAUSE/EARTHQUAKE,P3] ended up with activation values of .0463 and .0546, respectively, among the most strongly acti- vated inferences in the net. At this point, the process appears to coincide with the time interval between 1,000 and 1,500 ms shown in Figure 8. The construction-integration model thus accounts for the data in Figure 8 by means of an intricate interplay between con- Figure 9. The changing meaning of MINT. (The activation values of all propositions directly connected to MINT at the beginning and at the end of the process. The [.] notation is used as an abbreviation for the argu- ments of a proposition.) struction and integration phases: the construction of the text base and the context-free, associative knowledge elaboration during the first 350 ms of processing; the establishment of a coherent text base, which appears to be complete by 400 ms; and finally, an inference phase, involving new construction and new integration and requiring more than 500 ms of processing under the conditions of the Till et al. study. The model does not account for the time values cited here, but it describes a processing sequence in accordance with the empirically deter- mined time sequence. In many models of word identification, the problem is thought to be "How do we get from a certain (acoustic or visual) stimulus pattern to the place in the mental lexicon where the meaning of this word is stored?" In the present model, word identification is much more deeply embedded into the process of discourse understanding. The lexical node itself provides just one entry point into the comprehender's long-term memory store of knowledge and experiences, and what eventually be- comes activated from that store depends on the discourse con- text. In conceptions of the lexicon like that of Mudersbach's (1982), the meaning of a word is given by its "neighborhood" in the associative network into which it is embedded. Neighbor- hoods may be defined narrowly or broadly (nodes one link away vs. nodes several links away). In the present model, the meaning of a word is also given by its neighborhood--narrowly or broadly defined--not in the long-term memory net as a whole, but in the subnet that has been constructed as the mental repre- sentation of the discourse of which the word is a part. Because that representation changes as processing proceeds, word meanings change with it. Figure 9 depicts the changing meaning of MINT in our exam- ple. MINT is directly linked to nine propositions in the network; indirectly it is linked to the whole net, of course. If one takes as its contextual meaning only its immediate neighbors, one finds at the beginning of processing mostly closely related proposi- 174 WALTER K1NTSCH tions from the text base plus three weakly activated knowledge elaborations that in part do not fit into the context at all (CANDY). At the end of the process, however, the context-inap- propriate association has dropped out, other inferences have been added, and the activation is more evenly distributed among text propositions and knowledge elaborations. Thus, textual information becomes part of the contextual meaning of a word, in contrast to most traditional conceptions of "meaning" This example is, of course, no more than an illustration. Pa- rameters in our calculations could be changed. For example, more than just two associates could be sampled initially in the process of knowledge elaboration. In this case the neighbor- hood of MINT would contain many more knowledge elabora- tions than are shown in Figure 9, where there is a strong pre- dominance of text propositions. Not enough is known at pres- ent to set some of these parameters with confidence. But Figure 9 does reflect certain aspects of the data correctly: the equal initial activation of MONEY and CANDY, the later emergence of the topical inference EARTHQUAKE. Although much more re- search is needed to produce a more adequate picture of how the contextual meaning of words is constructed during discourse comprehension, here is a technique that at least may help us to do so. Arithmetic Word Problems How children understand and solve simple word arithmetic problems provides an excellent domain to try out the construe- tion-plus-integration model. Unlike with many other types of discourse, there are clear-cut criteria for when a problem is solved correctly, and the formal knowledge of arithmetic that is necessary for its solution is easily defined. However, word prob- lems, like all other texts, share the ambiguity and fuzziness of all natural language. Not only formal, arithmetic knowledge is involved in understanding these problems, but all kinds of lin- guistic and situational knowledge. What makes word problems hard--and interestingmare often not their formal properties, but the way a problem is expressed linguistically and the way formal arithmetic relations map into the situations being de- scribed. Thus, word problems are ideal from the standpoint of knowledge integration because it is precisely the integration of formal arithmetic knowledge and linguistic and situational un- derstanding that is at issue here. Another reason for choosing the domain of word problems is that there already exist alternative formal models of how chil- dren solve simple word arithmetic problems (Briars & Larkin, 1984; Kintsch & Greeno, 1985). Specifically, the work of Kintsch and Greeno will be taken as a starting point here. Their model represents a union of the work on problem solving in arithmetic by Riley, Greeno, and Heller (1983) on the one hand, and that on discourse understanding by van Dijk and Kintsch (1983) on the other. Kintsch and Greeno (1985) added to the discourse-comprehension strategies of the van Dijk and Kintsch model some special purpose strategies for solving word arithmetic problems, which they named the arithmetic strate- gies. For instance, if the model encounters a quantity proposi- tion, such as "six marbles," it forms a set and tries to fill in the various slots of the set schema: what the objects are, the cardinality of the set, a specification of the objects (e.g., that the marbles are owned by Fred), and the relation between the present set and other sets in the problem (the six marbles were given to Fred by Tom, which might identify them as a "transfer set"). Thus, the Kintsch and Greeno model for word problems builds a text base in quite the same way as in the van Dijk and Kintsch general theory of text comprehension, but it then forms a very specialized situation or problem model in terms of sets of objects and their interrelations. It solves a problem by recog- nizing a particular pattern of relations among sets (such as TRANSFER-IN or SUPERSET) and then using a stored-solution procedure appropriate to that case. 7 Thus, in terms of the fore- going discussion about knowledge use in discourse, the Kintsch and Greeno model is a "smart" model: Production rules are formulated in such a way that in each situation exactly the right arithmetic strategy is fired. The Kintsch and Greeno model of solving arithmetic word problems is useful in several ways. The model identifies differ- ent classes of errors, such as errors caused by a lack of arithme- tic knowledge, errors caused by linguistic misunderstandings, and errors that do not reflect a lack of knowledge at all but result from resource limitations. Certain formulations of word problems overload the resources of the comprehender, espe- cially short-term memory, leading to a breakdown in process- ing. As Kintsch and Greeno have shown, within each arithmetic problem type there exists a strong correlation between the fre- quency of errors made in solving the problem and the memory load imposed by it, even though there are no differences within problem types in either the arithmetic or linguistic knowledge required for solution. The model distinguishes between linguistic and arithmetic er- rors and helps us to investigate to what extent errors made by second- and third-grade pupils are caused by a failure to under- stand properly the text of the word problem, rather than by a faulty knowledge of arithmetic (e.g., Dellarosa, 1986; Della- rosa, Kintsch, Reusser, & Weimer, in press; Kintsch, 1987). If certain linguistic misunderstandings about the meanings of such key words as have more than, have altogether, or some are built into the knowledge base of the model, the model produces a pattern of wrong answers and misrecall of the problem state- ments that strikingly parallels some of the main types of errors that experimental subjects make. This is a good example of how much can be achieved even with the use of knowledge-poor rep- resentations in studies of discourse processing. The Kintsch and Greeno model knows about arithmetic (its arithmetic strate- gies), and it knows about the meaning of words (its lexicon; a semantic net in Dellarosa, 1986). However, it has no general world knowledge that would allow it to understand the situation described in a word problem. It merely picks out the crucial arithmetic information from the discourse and builds a propo- sitional text base for it. This is good enough for some purposes (e.g., the investigation of resource limitations or linguistic fac- tors in understanding as mentioned earlier, or to predict recall, summarization, or readability as in Kintsch & van Dijk, 1978, and related work), but it is not good enough for other purposes. 7 Computer simulations of this model have been developed by Flet- cher (1985) and Dellarosa (1986) and are available from the author. ROLE OF KNOWLEDGE IN DISCOURSE COMPREHENSION 177 Figure 11. The result of the integration process for the three sentences in the Manolita problem. (Propositions are indicated by single words; inferences are marked by an asterisk; their arrangement in the figure is approximate. The ordinate shows the activation values of each proposi- tion after the process has stabilized. Propositions carried over from one processing cycle to the next are connected by arrows.) 11. (If the activation process is extended to twice the number of cycles, the activation values for the arithmetic hypotheses, measured to four decimal places, do not change at all.) All text- derived propositions remain strongly activated, while none of the textual inferences (e.g., MUNDOZA is a NAME of a MALE, TULIPS are FLOWERS, RED, and GROW-IN-HOLLAND) reach a high level of activation. This is intuitively quite plausible. As far as the arithmetic is concerned, the problem is at this point understood correctly and practically solved: WHOLE[ 14] is more strongly activated than its alternative, PART[14]. Similarly, PART[6] is stronger than WHOLE[6]. The correct hypothesis, WPP, is the most strongly activated of the three alternative superschemata. Note that the text propositions and inferences are, in general, much more strongly activated than the arithmetic hypotheses. Therefore, the activation values of the latter must be considered separately, relative to each other, rather than in relation to the text propositions when it comes to selecting propositions to be maintained in the short-term memory buffer. This imbalance is required for the model to work. If the arithmetic hypotheses are weighted more heavily, they draw the activation away from the text itself, and the system cannot stabilize: It will flip-flop between alternative, mutually contradictory arithmetic sche- mata. The arithmetic hypotheses have to be anchored in a stable text representation. For the third and final sentence, the short-term memory buffer needs to carry over both text propositions to establish textual coherence and arithmetic hypotheses to take advantage of the understanding of the problem that has been achieved so far. It has been assumed here that the four strongest text propo- sitions as well as the four strongest arithmetic hypotheses are carried over in the buffer, as shown in Figure 13. (There are, of course, other plausible alternatives.) The three text propositions hypotheses. Thus, whatever strength each arithmetic hypothe- sis gathers from the text is fed into the superordinate arithmetic schemata consistent with it. These schemata are mutually ex- clusive and inhibit each other with connection values o f - 1. Note that only at this final level is inhibition among arithmetic hypotheses used: The hypotheses that a particular set of objects plays the role of WHOLE or PART set are also mutually exclusive, but they are not allowed to inhibit each other; they merely col- lect more or less positive evidence, which they then transmit to the superordinate stage where a selection among alternatives is made. The resulting connectivity matrix then becomes the multi- plier of the activation-state vector for the 28 propositions partic- ipating in this second processing cycle. Initially, these activation values are positive for the text-derived propositions, and zero otherwise, except for the propositions carried over in the buffer, which retain the activation values they reached in the last cycle. In this case, the activation vector stabilizes already after seven operations. The results are shown in the second panel of Figure Figure 12. The elaborated text base for the second sentence of the Man- olita problem. (Four propositions were carried over from the previous cycle in the short-term memory buffer. Solid lines connect text proposi- tions, broken lines inferences; nonarithmetic inferences are indicated by asterisks only.) 178 WALTER KINTSCH Mrs. Nosho was telling Mark about the two huge aquariums she kept when she was a little girl. "There were 30 fish in one and 40 fish in the other, so you can tell how many fish I had" How many fish did Mrs. Nosho have? In a simulation run of this problem the model failed because it did not come up with the transitive inference HAVE[X,Y]&CON- TAIN[Y,Z] implies HAVE[X,Z]. At this point, the process needs to go into a problem-solving mode in which the information in the text is elaborated in a more focused manner than is possible with the automatic-comprehension mechanisms discussed here. generated on the basis of this sentence bring with them into the net six knowledge propositions, one of which is NOT[CON- TAIN[GARDEN,TULIP]], which turns out to be crucial for the so- lution of the problem. In addition, new hypotheses about the question set are formed, and the schemata PPW and PWP, which were lost after the second cycle, are reconstructed. Be- cause the child knows about weeding gardens, the tulips that were pulled out are identified as a part of those that were in the garden in the beginning. Hence, a connection that favors the PART hypothesis over the WHOLE hypothesis is formed between the inference NOT[CONTAIN[GARDEN,TULIP]] and PART[?]. It completes the pattern that is the condition for the use of a LOCA- TION strategy: some tulips at one place in the past, then some not there, now some are left. The new net requires 43 operations to stabilize. The knowledge-based inference NOT[CONTAIN[GARDEN,TULIP]] achieves an activation level above the range of the text proposi- tions (Figure 11, third panel). The picture is completely clear as far as the arithmetic is concerned: All the correct hypotheses are strongly activated, and all incorrect alternatives have low or zero activation values. The final steps in the solution of the problem are procedural. From information associated with the WPP pattern the equa- tion 14 = 6 + ? is generated, which is then used to obtain the correct answer. A lot of mountains had to be moved to achieve a very simple result! The Manolita problem was solved without problem solving. The basic comprehension operations were sufficient; that is, it produced the inference that the pulled-out tulips are not in the garden, which was required for the application of the LOCATION strategy. However, this is not always the case. In many, not nec- essarily difficult, problems, more focused problem-solving op- erations are required because the random-inference generation process described earlier fails to generate the required infer- ence. Consider the following "thinking problem": Context Effects Problems embedded into a familiar situational context are much easier to solve than problems that must be solved without this situational support (e.g., Hudson, 1983). Thus, birds catch- ing worms present a concrete, understandable situation that makes it clear what is the whole and what are the parts, whereas abstract, ill-constrained problems do not. All depends on whether the right arithmetic strategy is used; the situation is of no help. In the worm-and-bird problem, the text provides a situa- tional constraint for the interpretation of the problem that has very little to do with arithmetic per se. It is the knowledge about birds eating worms that matters. The birds trying to catch the worm are understood as the WHOLE set, with the birds catching worms as one PART. and the birds unable to get a worm as the other PART. This understanding was achieved not because a cer- tain key phrase, like how many more, was parsed correctly but on the basis of general world knowledge. If there are birds, some of whom catch and some of whom do not catch a worm, what is the WHOLE set and what are the PARTS is given by general world knowledge that is not specific to arithmetic. The arithme- tic can hardly go wrong here because the well-known situation guarantees the right interpretation of the problem. It is this as- pect that the present model deals with most effectively. Context, however, does not always facilitate problem solu- tion, it may also interfere with it. Consider this typical school problem, with its highly impoverished context: Fred has four Chevies and three Fords. (a) How many cars does he have altogether? (b) How many more Chevies does he have than Fords? Context is no help with this problem; it must be solved on the basis of specialized arithmetic strategies, on the basis of the key words have altogether for Question A and have more than for Question B. Of course, children are much more familiar with the former (e.g., Riley et HI., 1983), but if the right strategies are available, both problems will be solved. In the model, too, the altogether in Question A will be connected with the HOW- MANY/WHOLE hypothesis, and the have more than will be con- nected with the HOW-MANY/PART hypothesis in Question B, and both questions will be answered equally well. After the first sentence, PART and WHOLE hypotheses are established for both the Chevies and the Fords, but there is not much to distinguish them; the superordinate schemata PPW, PWP, and WPP are only weakly activated and hardly differentiated. Question A, on the other hand, correctly activates the PPW hypothesis, and ROLE OF KNOWLEDGE IN DISCOURSE COMPREHENSION 179 Question B yields the WPP result. Thus, if the arithmetic knowledge is available, it makes very little difference which question follows the problem statement. In contrast, if the problem is only slightly contextualized, the model can be biased in favor of one of the questions, and actu- ally fails when it gets the wrong one. Suppose, the foregoing problem is changed to read Fred has a nice collection of antique cars. Four of his cars are Chev- ies, and three are Fords. Collection, like some, is constructed as a quantity proposition, and hence PART and WHOLE hypotheses for a set of cars with unspecified quantity are established in the first processing cycle. They are both activated equally, however, at this point. This changes dramatically with the second sentence: The four Chev- ies and three Fords are both identified as PART sets because of the phrase of his. In consequence, the model begins to favor the WPP hypotheses. When it receives Question A, the WPP hypothesis is decisively strengthened, and the problem is solved correctly. On the other hand, if it is given Question B, the model becomes confused between the WPP and PWP hypotheses, which are both equally activated, and fails to solve the problem. Thus, we have here an example where the problem context interferes with the solution of a problem. It biases the problem in favor of one particular interpretation, so that when another interpretation is required, the whole process fails. It is impor- tant, however, to analyze exactly why the model failed to answer Question B correctly: After processing the second sentence, it was so strongly convinced that the four Chevies and three Fords were both PART sets that it did not carry over the corresponding WHOLE set hypotheses and therefore had no way of using the information in the have-more-than question in support of the CHEVIES/WHOLE hypothesis. Thus, rather special circum- stances prevented the model from answering Question B. In slightly different circumstances, it could have done so: (a) if the buffer were large enough, the CHEVY/WHOLE hypothesis would not have been lost, or (b) if the model had been allowed to reread the problem statement. Question Specificity The final example illustrates some different aspects of word- problem solving; namely the complex role that redundant spec- ifications of sets may have. On the one hand, overspecifying a set can be helpful because it provides more than one way to refer to it. On the other hand, redundant specifications increase the length of the text and thus the likelihood that some important piece of information is no longer in active memory when it is required. In the following problem, three versions of the ques- tion are possible: Joe had a collection of nine marbles. He started his collection with some beautiful red marbles. Then Lucy added six pink marbles to his collection as a present. (a) How many beautiful red marbles did he start his collection with? (b) How many marbles did he start his collection with? (c) How many beautiful red marbles did he have? The first processing cycle results in undifferentiated hypotheses about the nine marbles. The set constructed in the second cycle, on the other hand, is dearly a PART set, as is the one constructed in the third cycle. Indeed, at the end of the third cycle, the model understands the problem essentially correctly, with the WPP schema greatly exceeding alternative hypotheses in activation value. To understand what happens next, it is necessary to know which text propositions were maintained in the buffer at the end of the third cycle: Only propositions from the third sentence are carried over, while the propositions from the second sentence are no longer held in active memory at this point. This has non- trivial consequences when the question is asked. In Versions A and B everything is all right, because the question itself identi- fies the question set as a PART set--starting a collection serves this function, just as it did in Sentence 2. Version C of the ques- tion, on the other hand, does not yield a correct solution. The question itself does not indicate the role of the question set, and there is no information from the second sentence still available in active memory that would help to identify its role either; be- cause there are already several strong PART hypotheses around, the model tends toward the hypothesis that the question set has the role of a WHOLE; the PWP schema thus becomes more acti- vated than the correct WPP schema. However, this is far from an unequivocal prediction of failure for Version C of the question. With a slightly larger buffer, or with a little less irrelevant material intervening (pink marbles, as a present), the critical information from the second sentence could have been maintained in the buffer and used to solve the problem. Or even more obviously, the problem solver could re- read the problem or perform a reinstatement search (Kintsch & van Dijk, 1978; Miller & Kintsch, 1980) to activate the required information from long-term memory. Rather the prediction is that children, like the model, would have more trouble with Question C, and fail more frequently, than with either A or B. Thus, the more specific the question the better. But how irrel- evant or redundant material will affect the difficulty of a word problem is a more complex story. It may be quite harmless, or may even facilitate problem solving, if the question exploits a redundancy in the specification of a set. But it may be a source of difficulty and even a cause of failure when the question is asked in an unhelpful way. The present model has the flexibility to handle these complex effects of context: Many small effects are allowed to add up and pull the model one way or another. The "smart" models of Kintsch and Greeno (1985) and Briars and Larkin (1984) have no ready way to cope with these subtle contextual demands: Either the right strategy is used or not. Discussion How people recall relevant knowledge when they read a text is reminiscent of another experimental paradigm that has been studied extensively in psychological laboratories: how people re- call lists of words. A widely used explanation for the recall of word lists is based on the generation-recognition principle. Some words are recalled directly, perhaps from a short-term memory buffer, and these words are then used to generate other semantically or contextually related, plausible recall candi- dates. Words that have actually appeared in the to-be-learned list will be recognized among these candidates and recalled, whereas intrusions will tend to be rejected. Generation-recog- nition theories have had their detractors, and in their most primitive form they are certainly inadequate to account for the