Understanding Visual Load in Lexical Processing: A Computational Model, Exams of Psychology

The concept of visual load, an estimate of a lexical item's ability to evoke mental images related to the concept it represents. The authors investigate the assessment and underlying features of visual load, and propose a computational model to encapsulate this notion. Relevant to memory, learning, and comprehension, visual load is explored through its effects on brain activity and Natural Language Processing.

Typology: Exams

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
On Mental Imagery in Lexical Processing:
Computational Modeling of the Visual Load Associated to Concepts
Daniele P. Radicioniχ, Francesca Garbariniψ, Fabrizio Calzavariniφ
Monica Biggioψ, Antonio Lietoχ, Katiuscia Saccoψ, Diego Marconiφ
χDepartment of Computer Science, Turin University Turin, Italy
φDepartment of Philosophy, Turin University Turin, Italy
ψDepartment of Psychology, Turin University Turin, Italy
Abstract
This paper investigates the notion of visual load, an es-
timate for a lexical item’s efficacy in activating mental
images associated with the concept it refers to. We elab-
orate on the centrality of this notion which is deeply
and variously connected to lexical processing. A com-
putational model of the visual load is introduced that
builds on few low level features and on the dependency
structure of sentences. The system implementing the
proposed model has been experimentally assessed and
shown to reasonably approximate human response.
Keywords: Visual imagery; Computational modeling;
Natural Language Processing.
Introduction
Ordinary experience suggests that lexical competence,
i.e. the ability to use words, includes both the abil-
ity to relate words to the external world as accessed
through perception (referential tasks) and the ability to
relate words to other words in inferential tasks of sev-
eral kinds (Marconi, 1997). There is evidence from both
traditional neuropsychology and more recent neuroimag-
ing research that the two aspects of lexical competence
may be implemented by partly different brain processes.
However, some very recent experiments appear to show
that typically visual areas are also engaged by purely
inferential tasks, not involving visual perception of ob-
jects or pictures (Marconi et al., 2013). The present work
can be considered as a preliminary investigation aimed
at verifying this main hypothesis, by investigating the
following issues: i) to what extent the visual load asso-
ciated with concepts can be assessed, and which sort of
agreement exists among humans about the visual load
associated to concepts; ii) which features underlie the
visual load associated to concepts; and iii) whether the
notion of visual load can be grasped and encapsulated
into a computational model.
As it is widely acknowledged, one main visual cor-
relate of language is imageability, that is the property
of a particular word or sentence to produce an experi-
ence of imagery: in the following, we focus on visual im-
agery (thus disregarding acoustic,olfactory and tactile
imagery), which we denote as visual load. The visual
load is related to the easiness of producing visual im-
agery when an external linguistic stimulus is processed.
Intuitively, words like ‘dog’ or ‘apple’ refer to concrete
entities and are associated with a high visual load, im-
plying that these terms immediately generate a mental
image. Conversely, words like ‘algebra’ or ‘idempotence’
are hardly accompanied by the production of vivid im-
ages. Although the construct of visual load is closely
related to that of concreteness, concreteness and visual
load can clearly dissociate, in that i) some words have
been rated high in visual load but low in concreteness,
such as some concrete nouns that have been rated low
in visual load (Paivio, Yuille, & Madigan, 1968); and,
conversely, ii) abstract words such as ‘bisection’ are as-
sociated with a high visual load.
The notion of visual load is relevant to many disci-
plines, in that it contributes to shed light on a wide vari-
ety of cognitive and linguistic tasks and helps explaining
a plethora of phenomena observed in both impaired and
normal subjects. In the next Section we survey a mul-
tidisciplinary literature showing how mental imagery af-
fects memory, learning and comprehension; we consider
how imagery is characterized at the neural level; and we
show how visual information is exploited in state-of-the-
art Natural Language Processing research. In the subse-
quent Section we illustrate the proposed computational
model for providing concepts with their visual load char-
acterization. We then describe the experiments designed
to assess the model through an implemented system, re-
port and discuss the obtained results. Conclusion will
summarize the work done and provide an outlook on fu-
ture work.
Related Work
As regards linguistic competence, it is generally ac-
cepted that visual load facilitates cognitive perfor-
mance (Bergen, Lindsay, Matlock, & Narayanan, 2007),
leading to faster lexical decisions than not-visually
loaded concepts (Cortese & Khanna, 2007). For ex-
ample, nouns with high visual load ratings are remem-
bered better than those with low visual load ratings in
long-term memory tests (Paivio et al., 1968). More-
over, visually loaded terms are easier to recognize for
subjects with deep dyslexia, and individuals respond
181
pf3
pf4
pf5

Partial preview of the text

Download Understanding Visual Load in Lexical Processing: A Computational Model and more Exams Psychology in PDF only on Docsity!

On Mental Imagery in Lexical Processing:

Computational Modeling of the Visual Load Associated to Concepts

Daniele P. Radicioniχ, Francesca Garbariniψ, Fabrizio Calzavariniφ

Monica Biggioψ, Antonio Lietoχ, Katiuscia Saccoψ, Diego Marconiφ

([email protected]) χDepartment of Computer Science, Turin University – Turin, Italy φDepartment of Philosophy, Turin University – Turin, Italy ψ (^) Department of Psychology, Turin University – Turin, Italy

Abstract This paper investigates the notion of visual load, an es- timate for a lexical item’s efficacy in activating mental images associated with the concept it refers to. We elab- orate on the centrality of this notion which is deeply and variously connected to lexical processing. A com- putational model of the visual load is introduced that builds on few low level features and on the dependency structure of sentences. The system implementing the proposed model has been experimentally assessed and shown to reasonably approximate human response. Keywords: Visual imagery; Computational modeling; Natural Language Processing.

Introduction

Ordinary experience suggests that lexical competence, i.e. the ability to use words, includes both the abil- ity to relate words to the external world as accessed through perception (referential tasks) and the ability to relate words to other words in inferential tasks of sev- eral kinds (Marconi, 1997). There is evidence from both traditional neuropsychology and more recent neuroimag- ing research that the two aspects of lexical competence may be implemented by partly different brain processes. However, some very recent experiments appear to show that typically visual areas are also engaged by purely inferential tasks, not involving visual perception of ob- jects or pictures (Marconi et al., 2013). The present work can be considered as a preliminary investigation aimed at verifying this main hypothesis, by investigating the following issues: i) to what extent the visual load asso- ciated with concepts can be assessed, and which sort of agreement exists among humans about the visual load associated to concepts; ii) which features underlie the visual load associated to concepts; and iii) whether the notion of visual load can be grasped and encapsulated into a computational model. As it is widely acknowledged, one main visual cor- relate of language is imageability, that is the property of a particular word or sentence to produce an experi- ence of imagery: in the following, we focus on visual im- agery (thus disregarding acoustic, olfactory and tactile imagery), which we denote as visual load. The visual load is related to the easiness of producing visual im- agery when an external linguistic stimulus is processed.

Intuitively, words like ‘dog’ or ‘apple’ refer to concrete entities and are associated with a high visual load, im- plying that these terms immediately generate a mental image. Conversely, words like ‘algebra’ or ‘idempotence’ are hardly accompanied by the production of vivid im- ages. Although the construct of visual load is closely related to that of concreteness, concreteness and visual load can clearly dissociate, in that i) some words have been rated high in visual load but low in concreteness, such as some concrete nouns that have been rated low in visual load (Paivio, Yuille, & Madigan, 1968); and, conversely, ii) abstract words such as ‘bisection’ are as- sociated with a high visual load. The notion of visual load is relevant to many disci- plines, in that it contributes to shed light on a wide vari- ety of cognitive and linguistic tasks and helps explaining a plethora of phenomena observed in both impaired and normal subjects. In the next Section we survey a mul- tidisciplinary literature showing how mental imagery af- fects memory, learning and comprehension; we consider how imagery is characterized at the neural level; and we show how visual information is exploited in state-of-the- art Natural Language Processing research. In the subse- quent Section we illustrate the proposed computational model for providing concepts with their visual load char- acterization. We then describe the experiments designed to assess the model through an implemented system, re- port and discuss the obtained results. Conclusion will summarize the work done and provide an outlook on fu- ture work.

Related Work

As regards linguistic competence, it is generally ac- cepted that visual load facilitates cognitive perfor- mance (Bergen, Lindsay, Matlock, & Narayanan, 2007), leading to faster lexical decisions than not-visually loaded concepts (Cortese & Khanna, 2007). For ex- ample, nouns with high visual load ratings are remem- bered better than those with low visual load ratings in long-term memory tests (Paivio et al., 1968). More- over, visually loaded terms are easier to recognize for subjects with deep dyslexia, and individuals respond

kly y

a-

↵ran, ng- s and ivity.

ter e- l-

↵er- ) f is- za- ra- xt ures i- nt ing r-

al and e

ile er- es d L’ animale che mangia banane su un albero `e la scimmia The animal that eats bananas on a tree is the monkey Figure 1: The dependency tree corresponding to a stim- ulus. of information processing—, including a deep represen- tation, which is a semantic network stored in long-term memory that contains a hierarchical representation of image descriptions; the spatial representation intended for collecting image components along with their spatial features; the visual representation that builds on an oc- cupancy array, storing information such as shape, size, etc.. Model The modeling phase has been characterized by the need of defining the notion of visual load in a uniform and computationally tractable manner. Such concept, in fact, is used by and large in literature with di↵erent meanings, thus giving raise to di↵erent levels of ambi- guity. We define visual load as the concept representing a direct indicator (a numeric value) of the ecacy for a lexical item to activate mental images associated to the concept referred to by the lexical item. Consequently, we expect that visual load also represents an indirect measure of the probability of activation of a brain area deputed to the visual processing. We conjecture that visual load is situated at the inter- section of lexical and semantic spaces, mostly associated to the semantic level. That is, the visual load is primar- ily associated to a concept, although lexical phenomena like terms availability (implying that the most frequently used terms are easier to recognize than those seen less often (Tversky & Kahneman, 1973)) can also a↵ect it. Based on the work by Kemmerer (2010) we explore the hypothesis that a limited number of primitive elements can be used to characterize and evaluate the visual load associated to concepts. Namely, Kemmerer’s Simulation Framework allows to grasp information about a wide va- riety of concepts and properties used to denote objects, events and spatial relations. Three main visual semantic components have been individuated that, in our opin- ion, are also suitable to be used as di↵erent dimensions along which to characterize the concept of visual load. They are: color properties, shape properties, and mo- tion properties. The perception of these properties is expected to occur in a immediate way, such that “dur- ing our ordinary observation of the world, these three attributes of objects are tightly bound together in uni- Figure 1: The (simplified) dependency tree correspond- ing to the sentence ‘The animal that eats bananas on a tree is the Monkey’. more quickly and accurately when making judgments about visually loaded sentences (Kiran & Tuchtenhagen, 2005). Neuropsychological research has shown that many aphasic patients perform better with linguistic items that more easily elicit visual imagery (Coltheart, 1980), although the opposite pattern has also been doc- umented (Cipolotti & Warrington, 1995). Visual imageability of concepts evoked by words and sentences is commonly known to affect brain activity. While visuosemantic processing regions, such as left in- ferior temporal gyrus and fusiform gyrus revealed greater involvement during the comprehension of highly image- able words and sentences (Bookheimer et al., 1998; Mel- let, Tzourio, Denis, & Mazoyer, 1998), other seman- tic brain regions (i.e., superior and middle temporal cortex) are selectively activated by low-imageable sen- tences (Mellet et al., 1998; Just, Newman, Keller, McE- leney, & Carpenter, 2004). Furthermore, a growing num- ber of studies suggests that words encoding different vi- sual properties (such as color, shape, motion, etc.) are processed in cortical areas that overlap with some of the areas that are activated during visual perception of those properties (Kemmerer, 2010). Investigating the visual features associated to linguis- tic input can be useful to build semantic resources de- signed to deal with Natural Language Processing (NLP) problems, such as individuating verbs subcategorization frames (Bergsma & Goebel, 2011), enriching the tradi- tional extraction of distributional semantics from text with a multimodal approach, integrating textual features with visual ones (Bruni, Tran, & Baroni, 2014). Finally, visual attributes are at the base of the development of annotated corpora and resources that can be used to ex- tend text-based distributional semantics by grounding word meanings on visual features, as well (Silberer, Fer- rari, & Lapata, 2013). Model Although much work has been invested in different ar- eas for investigating imageability in general and visual imagery in particular, at the best of our knowledge no attempt has been carried out to formally characterize visual load, and no computational model has been de- vised to compute how visually loaded are sentences and lexicalized concepts therein. We propose a model that relies on a simple hypothesis additively combining few low-level features, refined by exploiting syntactic infor- mation. The notion of visual load, in fact, is used by and large in literature with different meanings, thus giving rise to different levels of ambiguity. We define visual load as the concept representing a direct indicator (a numeric value) of the efficacy for a lexical item to activate mental images associated to the concept referred to by the lexical item. We expect that visual load also represents an indirect measure of the probability of activation of brain areas deputed to the visual processing. We conjecture that the visual load is primarily as- sociated to concepts, although lexical phenomena like terms availability (implying that the most frequently used terms are easier to recognize than those seen less often (Tversky & Kahneman, 1973)) can also affect it. Based on the work by Kemmerer (2010) we explore the hypothesis that a limited number of primitive elements can be used to characterize and evaluate the visual load associated to concepts. Namely, Kemmerer’s Simulation Framework allows to grasp information about a wide va- riety of concepts and properties used to denote objects, events and spatial relations. Three main visual semantic components have been individuated that, in our opin- ion, are also suitable to be used as different dimensions along which to characterize the concept of visual load. They are: color properties, shape properties, and mo- tion properties. The perception of these properties is expected to occur in a immediate way, such that “dur- ing our ordinary observation of the world, these three attributes of objects are tightly bound together in uni- fied conscious images” (Kemmerer, 2010). We added a further perceptual component related to size. More pre- cisely, our assumption is that information about the size of a given concept can also contribute, as an adjoint fac- tor and not as a primitive one, to the computation of a visual load value for the considered concept. In this setting, we represent each concept/property as a boolean-valued vector of four elements, each encoding the following information: lemma, morphological infor- mation on POS (part of speech), and then whether the considered concept/property conveys information about color, shape, motion and size.^1 For example, this piece of information table,Noun,1,1,0,1 (1) can be used to indicate that the concept table (associated with a Noun, and differing, e.g., from that associated with a Verb) conveys information about color, shape and size, but not about motion. In the following, these are (^1) We adopt here a simplification, since we are assuming that the pair 〈lemma, POS〉 is sufficient to identify a con- cept/property, and that in general we can access items by disregarding the word sense disambiguation problem, which is known as an open problem in the field of NLP.

weighting scheme w~, is then computed as follows:

VL(d, ~w) =

c∈d VL(c)^ (4) VL(T, ~w) = VL(T ). (5)

The whole pipeline from the input parsing to compu- tation of the VL for the considered stimulus has been implemented as a computer program; its main steps in- clude the parsing of the stimulus, the extraction of the (lexicalized) concepts by exploiting the output of the morphological analysis, and the tree traversal of the de- pendency structure resulting from the parsing step. The morphological analyzer has been preliminarily fed with the whole set of stimuli, and its output has been anno- tated with the visual features and stored into a dictio- nary. At run time, the dictionary is accessed based on morphological information, then used to retrieve the val- ues of the features associated with the concepts in the stimulus. The output obtained by the proposed model has been compared with the results obtained in a behav- ioral experimentation as described below.

Experimentation

Materials and Methods

Thirty healthy volunteers, native Italian speakers, ( females and 14 males), 19 − 52 years of age (mean ±sd = 25. 7 ± 5 .1), were recruited for the experiment. None of the subjects had a history of psychiatric or neu- rological disorders. All participants gave their written informed consent before participating in the experimen- tal procedure, which was approved by the ethical com- mittee of the University of Turin, in accordance with the Declaration of Helsinki (World Medical Association, 1991). Participants were all na¨ıve to the experimental procedure and to the aims of the study.

Experimental design and procedure Participants were asked to perform an inferential task “Naming from definition”. During the task a sentence was pronounced and the subjects were instructed to listen to the stim- ulus given in the headphones and to overtly name, as accurately and as fast as possible, the target word cor- responding to the definition, using a microphone con- nected to a response box. Auditory stimuli were pre- sented through the E-Prime software, which was also used to record data on accuracy and reaction times. Fur- thermore, at the end of the experimental session, the subjects were administered a questionnaire: they had to rate on a 1 − 7 Likert scale the intensity of the visual load they perceived as related to each target and to each definition. The factorial design of the study included two within- subjects factors, in which the visual load of both target and definition was manipulated. The resulting four ex- perimental conditions were as follows:

VV Visual Target—Visual Definition (e.g., ‘The bird of

prey with great wings flying over the mountains is the

... eagle’);

VNV Visual Target—Non-Visual Definition (e.g., The hottest of the four elements of the ancients is... fire);

NVV Non-Visual Target—Visual Definition (e.g., The nose of Pinocchio stretched when he told a... lie);

NVNV Non-Visual Target—Non-Visual Definition (e.g., The quality of people that easily solve difficult problems is said... intelligence). For each condition, there were 48 sentences, 192 sen- tences overall. Each trial lasted about 30 minutes. The number of words (nouns and adjectives), their balancing across stimuli, and the (syntactic dependency) structure of the considered sentences were uniform within condi- tions, so that the most relevant variables were controlled. The same set of stimuli used for the human experiment was given in input to the system implementing the com- putational model.

Data analysis

The participants’ performance in the “Naming from def- inition” task was evaluated by recording, for each re- sponse, the reaction time RT, in milliseconds, and the accuracy AC, computed as the percentage of correct an- swers. The answers were considered correct if the target word was plausibly matched with the definition. Then, for each subject, both RT and AC were combined in the Inverse Efficiency Score (IES), by using the formula IES = (RT/AC) · 100. IES is a metrics commonly used to aggregate reaction time and accuracy, and to summa- rize them (Townsend & Ashby, 1978). The mean IES value was used as the dependent variable and entered in a 2 × 2 repeated measures ANOVA with ‘target’ (two levels: ‘visual’ and ‘non-visual’) and ‘definition’ (two lev- els: ‘visual’ and ‘non-visual’) as within-subjects factors. Post hoc comparisons were performed by using the Dun- can test. The scores obtained by the participants in the visual load questionnaire were analyzed by using unpaired T- tests, two tailed. Two comparisons were performed for visual and non-visual targets, and for visual and non- visual definitions. The computational model results were analyzed by using unpaired T-tests, two tailed. Two comparisons were performed for visual and non-visual targets and for visual and non-visual definitions. Correlations between IES, computational model and visual load questionnaire. We also explored the existence of correlations between IES, the visual load questionnaire and the computational model output by using linear regressions. For both the IES values and the questionnaire scores, we computed for each item the mean of the 30 subjects’ responses. In a first model, we used the visual load questionnaire scores as independent variable to predict the participants’ performance (with

Figure 3: The graph shows, for each condition, the mean IES with standard error.

IESas the dependent variable); in a second model, we used the computational data as independent variable to predict the participants’ visual load evaluation (with the questionnaire scores as the independent variable). In order to verify the consistency of the correlation effects, we also performed linear regressions where we controlled for three covariate variables: the number of words, their balancing across stimuli and the syntactic dependency structure.

Results

The ANOVA showed a significant effect of the within- subject factors “target” (F 1 , 29 = 14.4; p < 0 .001), sug- gesting that the IES values were significantly lower in the visual than in the non-visual targets, and “defini- tion” (F 1 , 29 = 32.78; p < 0 .001), suggesting that the IES values were significantly lower in the visual than in the non-visual definitions. This means that, for both the tar- get and the definition, the participants’ performance was significantly faster and more accurate in the visual than in the non-visual condition. We also found a significant interaction “target*definition” (F 1 , 29 = 7.54; p = 0.01). Based on the Duncan post hoc comparison, we verified that this interaction was explained by the effect of the visual definitions of the visual targets (VV condition), in which the participants’ performance was significantly faster and more accurate than in all the other conditions (VNV; NVV; NVNV), as shown in Figure 3. By comparing the questionnaire scores for visual (mean ±sd = 5. 69 ± 0 .55) and non-visual (mean ±sd =

  1. 73 ± 0 .71) definitions we found a significant difference (p < 0 .001; unpaired T-test, two tailed). By compar- ing the questionnaire scores for visual (mean ±sd =
  2. 32 ± 0 .4) and non-visual (mean ±sd = 4. 23 ± 0 .9) targets we found a significant difference (p < 0 .001). This suggest that our arbitrary categorization of each sentences within the four conditions was supported by

Figure 4: Linear regression “Inverse Efficiency Score (IES) by Visual Load Questionnaire”. The mean score in the Visual Load Questionnaire, reported on 1 − 7 Lik- ert scale, was used as an independent variable to predict the subjects’ performance, as quantified by the IES.

the general agreement of the subjects. By compar- ing the computational model scores for visual (mean ±sd = 4. 0 ± 2 .4) and non-visual (mean ±sd = 2. 9 ± 2 .0) definitions we found a significant difference (p < 0 .001; unpaired T-test, two tailed). By comparing the compu- tational model scores for visual (mean ±sd = 2. 53 ± 1 .29) and non-visual (mean ±sd = 0. 26 ± 0 .64) targets we found a significant difference (p < 0 .001). This suggest that we were able to computationally model the visual- load of both targets and descriptions, describing it as a linear combination of different low-level features: color, shape, motion and dimension. Results correlations. By using the visual load ques- tionnaire scores as independent variable we were able to significantly (R^2 = 0.4; p < 0 .001) predict the partici- pants’ performance (that is, their IES values), illustrated in Figure 4. This means that the higher the participants’ visual score for a definition, the better the participants’ performance in giving the correct response (or, alterna- tively, the lower the IES value). By using the computational data as independent vari- able we were able to significantly (R^2 = 0.44; p < 0 .001) predict the participants’ visual load evaluation (their questionnaire scores), as shown in Figure 5. This means that a correlation exists between the computational pre- diction about the visual load of the definitions and the participants visual load evaluation: the higher is the computational model result, the higher is the partici- pants’ visual score in the questionnaire. We also found that these effects were still significant in the regres- sion models where the number of words, their balancing across stimuli and the syntactic dependency structure was controlled for.