Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Complex Information-Processing Systems: A Case Study from Vision, Study notes of Theory of Computation

Massachusetts Institute of Technology (MIT)Theory of Computation

The need to understand complex information-processing systems at multiple levels of description, using vision research as a case study. It describes the four levels of description and provides examples of theories at each level. The document also briefly reviews the prospects for the field. a working paper from the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

prindhorn 🇺🇸

4.6

(11)

276 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

MASSACHUSETTS

INSTITUTE

TECHNOLOGY

ARTIFICIAL

INTELLIGENCE

LABORATORY

Working

Paper

*131

August

1976

FROM

COMPUTATIONAL

THEORY

PSYCHOLOGY

AND

NEUROPHYSIOLOGY

--.

case

study

from

vision

Marr

SUMMARY:

The

CNS

needs

understood

four

nearly

independent

levels

description:

(1)

that

which

the

nature

computation

expressed;

(2)

that

which

the

algorithms

that implement

computation

are

characterlsed;

(3)

that

which

algorithm

committed

particular

mechanisms;

and

(4)

that

which

the

mechanisms

are

realised

hardware.

general, the nature

computation

determined

the

problem to

solved,

the

mechanisms

that

are

used

depend

upon

the

available

hardware,

and

the

particular

algorithms

chosen

depend

the

problem and

the

available

mechanisms.

Examples

are

given

theories

each

level

from

current

research

vision,

and

brief

review

the

immediate

prospects

for

the

field

given.

This

report

describes research

done

the

Artificial

Intelligence

Laboratory

the

Massachusetts

Institute

Technology. Support

for

the

laboratory's

artificial

intelligence

research

provided

part

the

Advanced

Research

Projects Agency

the Department

Defense

under

Office

Naval

Research

contract

N00014-75-C-0643.

Working

papers

are

informal

papers

intended

for

internal

use.

Discover Study notes of Theory of Computation Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download Understanding Complex Information-Processing Systems: A Case Study from Vision and more Study notes Theory of Computation in PDF only on Docsity!

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY

Working Paper *131 (^) August 1976

FROM COMPUTATIONAL THEORY TO PSYCHOLOGY AND

NEUROPHYSIOLOGY --. a case study from vision

by D. Marr

SUMMARY: The CNS needs to be understood at four nearly independent levels of

description: (1) that at which the nature of a computation is expressed; (2) that at which the algorithms that implement a computation are characterlsed; (3) that at which an algorithm is committed to particular mechanisms; and (4) that at which the mechanisms are realised in hardware. In general, the nature of a computation is determined by the problem to be solved, the mechanisms that are used depend upon the available hardware, and the particular algorithms chosen depend on the problem and on the available mechanisms. Examples are given of theories at each level from current research in vision, and a brief review of the immediate prospects (^) for the field is given.

This report describes research done (^) at the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. Support for the laboratory's artificial intelligence research is provided in part by the (^) Advanced Research Projects Agency of the Department of Defense under Office of Naval Research contract N00014-75-C-0643.

Working papers are informal papers intended (^) for internal use.

2 VISUAL INFORMATION PROCESSING

Introduction Modern neurophysiology has learned much about the operation of the individual neuron, but deceivingly little about the meaning of the circuits they compose. The reason for this can be attributed, at least in part, to a failure to recognise what it means to understand a complex information-processing system. Complex systems cannot be understood as a simple extrapolation of the properties of their elementary components.. One does not formulate a description of thermodynamic al effects using a large set of wave equations, one for each of the particles involved. One describes such effects at their own level, and tries to show that in principle, the microscopic and. macroscopic descriptions are consistent with one another. The core of the problem is that a system as complex as a nervous system or a developing embryo must be analyzed and understood at^ several^ different^ levels.^ For^ a system that solves an information processing problem, we may distinguish four important levels of description. At the^ lowest,^ there^ is^ basic^ component^ and^ circuit^ analysis^ --^ how^ do transistors, neurons, diodes^ and^ synapses^ work?^ The^ second^ level^ is^ the^ study^ of^ particular mechanisms; adders, multipliers, and memories accessed by address or by content. The third level is that of the algorithm, and the top level contains the theory of the overall. computation. For example, take the case of Fourier analysis. The computational theory of the Fourier transform is well understood, and is expressed independently of^ the^ particular way in which it is computed. One level down, there are several algorithms for implementing a Fourier transform -- the Fast Fourier transform (Cooley & Tukey^ 1965) which is a serial algorithm; and the parallel "spatial"^ algorithm^ that^ is^ based^ on^ the mechanisms of laser optics. All these algorithms carry out the same^ computation,^ and^ the choice of which one to use depends upon the particular mechanisms^ that^ are^ available.^ If one has fast digital memory, adders and^ multipliers,^ one^ will^ use^ the^ FFT,^ and^ if^ one^ has^ a laser and photographic plates, one will use an "optical" algorithm. In general, mechanisms are strongly determined by^ hardware,^ the^ nature^ of^ the computation^ Is^ determined^ by^ the problem, and the algorithms are determined by the computation and^ the^ available mechanisms. Each of these four levels of description has its^ place^ in^ the^ eventual understanding of perceptual information processing,^ and^ it^ is^ important^ to^ keep^ them separate. Of course, there are^ are^ logical^ and^ causal^ relationships^ among^ them,^ but^ the important point is that these levels of^ description^ are^ only^ loosely^ related.^ Too^ often^ in attempts to^ relate^ psychophysical^ problems^ to^ physiology^ there^ is^ confusion^ about^ the^ level^ at which a problem arises -^ is^ it^ related mainly^ to^ biophysics^ (like^ after-images)^ or^ primarily^ to information processing (like the ambiguity of the Necker cube)? More^ disturbingly, although the^ top^ level^ is^ the^ most^ neglected,^ it^ is^ also^ the^ most^ important.^ This^ is^ because the structure of the computations that^ underly^ perception^ depend^ more^ upon^ the computational problems that have to^ be solved^ than^ on^ the^ particular^ hardware^ in^ which their solutions are^ implemented.^ There^ Is^ an^ analog^ of^ this^ in^ physics,^ where^ a thermodynamical approach represented, at^ least^ historically,^ the^ first^ stage^ in^ the^ study^ of matter. A^ description^ in^ terms^ of^ mechanisms^ or^ elementary^ components^ usually^ appears afterwards.

Marr

Figure

Examples

pairs

perspective

line

drawings

presented

the

subjects

Shepard

& Metzler's

(1971)

experiments

mental

rotation.

(A)

"same"

pair, which

differs

^80

degree

rotation

the

picture

plane;

(8)

"same"

pair

which

differs

^80

degree

rotation

in depth;

(C)

"different"

pair,

which

cannot

brought

into

congruence

any

rotation.

The

time

taken to

decide

whether

pair

the

"same"

varies

linearly

with

the

(3-D)

angle

which

one'must

rotated

brought

into correspondence

with

the

other.

(reconstructed

from

figure

Shepard

Metzler;

1971).

5 VISUAL (^) INFORMATION PROCESSING

different approach. In .1971, R. N. Shepard and J. Metzler (1971) made line drawings of simple objects, which differed from one another either by a 3-D rotation relative to the viewer, or by a rotation plus a reflection (see figure 1). They asked how long it took to .decide whether two depicted (^) objects differed by a rotation and reflection, or merely a rotation. They found that the time taken depended on the (^) 3-D angle of rotation necessary to bring the two objects into correspondence, not the 2-D angle between their images; and that it varied (^) linearly with this angle. Similar findings have been reported in many subsequent investigations, and have led to (^) the resurgence of ideas about mental imagery, and to analogies between visual recognition and computer graphics systems (Shepard 1975). Interesting and important though these findings are, one must sometimes be allowed the luxury of pausing to reflect upon the overall trends that they represent, in order to take stock of the kind of knowledge that is accessible to these techniques. This proposal is itself an attempt at examining the link between various current approaches, including those of neurophysiology and psychophysics. We would also like to know what are the limitations of these approaches, and how can one compensate for their deficiencies? Perhaps the most striking feature of these disciplines at present is their phenomenological character. They describe the behavior of cells or of subjects, but do not explain it. What is area 17 actually doing? What are the problems in doing it that need explaining, and at what level of description should such explanations be sought?. In trying to come to grips with these problems, D. Marr and his students at the M. I. T. Artificial Intelligence Laboratory^ have^ adopted^ a^ point^ of^ view^ that^ regards visual perception as a problem primarily in information processing. The problem commences with a large,^ gray-level^ intensity^ array, and^ it^ culminates^ in^ a^ description^ that depends on that array, and on the purpose that the viewer brings to it. Viewed in this light, a theory of visual information processing will exhibit the four levels of description that, as we saw in the introduction, are attached to any device that solves an information processing problem; and the first task of a theory of vision is.to examine the top level. What exactly is the underlying nature of the computations being performed during visual perception?

A computational- approach to (^) vision The empirical findings of the last 20 years, together with related anatomical (Allman 1972, 1973, 1974a, b & c, Zeki 1971) and clinical (e.g. Luria 1970, Critchley 1953, Vinken & Bruyn 1969) experience, have strengthened a view for which widespread indirect evidence previously existed, namely that the cerebral cortex is divided into many different areas that are distinguished structurally, functionally and by their anatomical connections. This suggests (^) that, to a first approximation visual information processing can be thought of as having a modular structure, a view which is strongly supported by evolutionary arguments. If this is true, the task of a top-level theory of vision is clear; what are the modules, what does each do, and how? The approach of the M. I. T. Artificial Intelligence Laboratory to the vision problem rests on these assumptions. We believe that the principal problems at present are (a) to formulate the likely modularization, and (b) to (^) understand the computational problems each module presents. Unlike simpler systems like the fly

Marr

REPRESENTATION OF^ 3-D^ STRUCTURE

I

AXES FOUND IN^ IMAGE

2½-D LABELLING OF^ CONTOURS

FIGURE-GROUN

STEREO LIGHTNESS^ AGGREGATION^ TEXTURE^ MOTION

LEFT & RIGHT PRIMAL^ SKETCHES

LEFT & RIGHT^ IMAGES

Figure 2. This diagram summarises our overall view of the visual recognition problem, and it embodies several (^) points that our approach takes as assumptions. The fir.:ý is that the recognition process decomposes to a set of modules that are to (^) a first approximation independent. The simplified subdivision shown here consists of four main stages, each of which may contain several modules. (^) (1) The translation of the image into a primitive description called the primal sketch (Marr (^) 1976b); (2) The division of the primal sketch into regions or forms, through the action of various grouping processes (^) ranging in scope from the very (^) local to global predicates like a rough type of connectedness; (3) The assignment (^) of an axis-based description to each form (see figure 4); and (4) The construction of a 3-D model for the viewed shape, based initially on the axes delivered by (3). The relation (^) between the 3-0 model representation (^) of a shape and the image of that shape is found and maintained with the help of the image-space processor. Finally, the representation (^) of the geometry of a shape is separate (^) from the representation of the shape's use or purpose (Warrington & Taylor 1973).

Figure

shows

the

image

toy

bear,

printed

font

with

^16

grey

levels.

3b,

the

intensity

each

point

represented

along

the

z-axis.

illustrates

the

spatial

component

the

raw

primal

sketch

obtained

from

this

image.

Associated

with

each

line

segment

are

measures

contrast,

type

and

extent

the

intensity

change,

position

and

orientation.

This

image

simple

that

purely

local

grouping

processes

suffice

extract

the

major

forms

from

the

primal sketch.

These

forms

are

exhibited

3d,

111111111111

1111

III

I|^

.^ .^ .^

~cl

10 VISUAL INFORMATION PROCESSING

Figure 4. The geometry of constraints on the computation (^) of binocular disparity. 4a illustrates the constraints for the case of a one-dimensional image. Lx and Ly represent the positions of descriptive elements from the left and right views, and the horizontal and vertical (^) lines indicate the range of disparity values that can be assigned to left-eye and right-eye elements. The uniqueness condition states that only one disparity value may be assigned to each descriptive element. That is, only one disparity value may be "on" along each (^) horizontal or vertical line. The continuity condition states that we seek solutions in which disparity values vary smoothly almost.everywhere. That is, (^) solutions tend to spread along the dotted diagonals, which are lines of constant disparity, and between adjacent diagonals. 4b shows how this geometry appeats at each intersection point. The constraints may be implemented by a (^) network with positive and negative interactions that obey this geometry, because the stable states of such a network are precisely the states that satisfy the constraints on the computation. 4c shows the constraint geometry for a 2-dimensional image. The negative interactions remain essentially unchanged, but the positive ones now extend over a small 2-dimensional neighbourhood. A network with this geometry was used to perform the computation exhibited in figure 8.

Marr

AA-

13 VISUAL INFORMATION PROCESSING

three-dimensional shapes. One component deals with the nature of the representation system that .is used, and the other with how to obtain it from the types of description that can be^ delivered^ from^ the^ primal.^ sketch.^ The^ key^ ingredients^ of^ the^ representation^ system are: (a) The deep structure of the three-dimensional representation of an object consists of a stick figure, where in formal terms each stick represents one or more axes in the object's generalized cone representation, as illustrated in figure 5. In fact, a hierarchy of stick figures exists, that allows one to describe an object on various scales with varying degrees of detail. (b) Each stick figure is defined by a propositional database called a 3-D model. The geometrical structure of a 3-D model is specified by storing the relative orientations of pairs of connecting axes. This specification is local rather than global, and it contrasts with schemes in which the^ position^ of^ each^ axis^ is^ specified^ in^ isolation,^ using^ some circumscribing frame of reference. (See legend to figure 5). (c) When a 3-D model is being used to interpret an image,^ the^ geometrical^ relationships^ in the model are^ interpreted^ by^ a^ computationally^ simple^ mechanism^ called^ the^ image-space processor, which may be thought of as a device for representing the positions of two vectors in 3-space, and for computing their projections onto the image. .(d). During recognition, a sophisticated interaction takes place between^ the^ image,^ the^ 3-D model, and the image-space processor. This interaction gradually relaxes the stored 3-D model onto the axes computed from the image. Some facets of this process resemble the computation of a 3-D rotation, but^ a^ simple computer^ graphics^ metaphor^ is^ misleading.^ In fact, the rotations take place on abstract vectors (the axes) that are not even present in the original image; and at any moment, only two such^ vectors^ are^ explicitly^ represented. The essence of this part of the theory is a method for representing the spatial disposition of the parts of an object and their relation to the viewer.

6: 2 1/2 - dimensional analysis of an image (Marr 1976c, Marr & Vatan in preparation) In simple images, the forms delivered from the primal sketch correspond to the contours^ of^ physical^ objects.^ Finally^ therefore,^ we^ need^ to^ bridge^ the^ gap^ between such forms and the beginning of the 3-D analysis described in the previous paragraph. We call this 2 112 - dimensional analysis, and it consists largely^ of^ assigning^ to^ contours^ labels, that reflect aspects^ of^ their^ 3-dimensional^ configuration,^ before^ that^ configuration^ has^ been made explicit. The most powerful single idea here is the distinction between convex and concave edges and contour segments. One can show that these distinctions are preserved^ by orthogonal projections, and can be made the basis of a segmenting technique^ that decomposes a figure into 2-D regions that correspond to^ the^ appropriate^ 3-D^ decomposition for a wide range of viewing angles (see figure 6). Marr (1976c) has proved that the assumptions, that are implicit in the use of the convex-concave, distinction to^ analyze^ a contour, are equivalent to assuming that the viewed shapes are composed of generalized cones. This adds additional support for using. the stick-figure^ scheme^ based^ on^ generalized

Marr

Marr 14 VISUAL INFORMATION (^) PROCESSING

Figure 5. Examples of 3-D models, and their arrangement into the 3-D model representation of a human shape. A 3-D model consists of a model axis (a) and component axes (b) that consist of a principal axis (the torso) and several auxiliary axes (the head and limbs) whose positions are described relative to the principal axis. The (^) complete human 3-D model is enclosed in a rectangle (c). The 3-D model representation is obtained by concatenating 3-D models for different parts at different levels of detail. This is achieved by allowing a component axis of one 3-D model to be the model axis of another. Here, for example, the arm auxiliary axis in the human 3-D model acts as the model axis for the arm 3-D model, which itself has two component axes, the upper and lower arms. The figure shows how this scheme extends downwards as far as the fingers.

16 VISUAL INFORMATION PROCESSING

Figure 6. Analysis of a contour from Vatan (^) and Marr (1976). The outline (a) was obtained by applying local grouping operations to-a primal sketch, as in figure 4. It (^) is then smoothed, and divided into convex and concave components (b). The outline is searched for deeply concave points or components, which correspond to strong segmentation points. One such point is marked with an open circle in (c). There are (^) usually several possible matching points for each strong segmentation point, and the candidates for the marked point. are shown here by filled circles (c). The correct mates for each segmentation point can usually be found by eliminating relatively poor candidates. The result of doing this here is the segmentation shown in (d). Once these segments have been defined, their corresponding axes (thick lines) are easy to obtain (e). They do not usually connect, but may be related to one another by intermediate lines which are called embedding relations (thin lines in f). According to the 3-D representation theory, the resulting stick figure (f) is the deep structure on which interpretation of this image is based.

Marr

a.100-

100

(I I0I I II I I I 100

100

01 III •II 100

100

d.-

100-

f.-

100-

SI I I 11 1661001111111

100

I I I I I I I F " r 100

++ +

+ + _ +_* + +

+. +++ I I I I

c.-

100-

e.:

100-

0 0

19 VISUAL INFORMATION PROCESSING

cones to represent 3-D shapes. The theory assigns many alternating (^) figure effects like the Necker cube to the existence (^) of alternative self-consistent labellings computed at this stage. It is perhaps worth mentioning one interesting (^) point that has emerged from this way of recognising and representing 3-D shapes. Warrington & Taylor (^) (1973) described patients (^) with right parietal lesions who had difficulty in recognising objects seen in "unconventional" views - (^) like the view of a water pail seen from above (see figure 7). They did not attempt to define what makes a view unconventional. According (^) to our theory, the most troublesome views (^) of an object will be those in which its stick-figure axes cannot easily be recovered from the image. The (^) theory therefore predicts that unconventional views in the Warrington & Taylor sense will correspond (^) to those views in which an important axis in the object's generalised cylinder representation is foreshortened. Such views are (^) by no means uncommon - if a 35mm camera is directed towards you, you are seeing an unconventional view of it, since the axis of its lens is foreshortened.

Examples of algorithms and mechanisms Between the top and bottom of our four levels lie descriptions of algorithms and descriptions of mechanisms. The distinction between (^) these two levels is rather subtle, since they are (^) often closely related. The form of a specific algorithm can impose strong constraints on the mechanisms, and conversely. (^) Let us consider three examples.

1: "Simple" algorithms An algorithm operates on some kind of input and yields a corresponding output. In formal terms, an algorithm can be thought of as a mapping between the input and the output space. Perhaps the simplest (^) of all nonlinear operators on a linear space are the so-called polynomial operators. They encompass a broad spectrum (^) of applications including all linear. problems, and they approximate all sufficiently (^) smooth, nonlinear operators. For this particular class of "simple" algorithms (^) (i.e. representable through a "smooth" operator) (^) polynomial representations provide a (^) canonical decomposition in a series

of simpler, multilinear operators. Figure 8 shows this decomposition in terms of interactions or "graphs" of various orders: in this (^) way an algorithm, or its network Implementation, may be decomposed into an additive sequence of simple, canonical terms, just (^) as in another context, a function (^) can be conveniently characterized by its various Fourier terms. Moreover, functional and computational properties (^) can be associated with interactions of a given order and type. Poggio & Reichardt (1976) used the polynomial representation (^) of functionals to (^) classify the algorithms underlying movement, position and figure-ground computation in the fly's (^) visual system. The idea was to identify which terms, among the diversity of the possible ones, are implied by the experimental (^) data. Figure 8 shows the graphs that play a significant role in the (^) fly's. control of flight and, in this sense, characterize the algorithms involved. (^) The notion that seems to capture best the "computational (^) complexity" of these simple, smooth mappings is the notion of p-order (perceptron-order, (^) see Poggio and Reichardt, 1976). Movement computation in (^) the fly is of

Marr

f\T%

Jil n-7-

Separation (^) of the three types of interactions in the fly Movement (^) computation Position ("attractiveness") computation

',V I V

torresponding to ru Corresponding to D(O) Correction to superposition rule Homogencously distributed Mostly in the lower part iMoatly in the lower part in the eye (no Ltrong de- of the eye (D(.) and of the eye pendence on 0 and 0) L(M)^ dependence) No "age" depdndenco (?) "Age" dependence Light intensity^ threshold^ Light^ intensity^ threshold^? at about^^10 - 4^ candel/m^2 (of^ fixation!)^ at^ about (Eckert, 1973) LO-^ cd/m 2 (Reichardt, 1973; WehrhJhn, (^) 1976 Present in the Drosophila Disturbed in the Drosophila? mutant S 129 (IHeisenberg, mutant (^) S 129 (Heisenberg. pers. comm.) pers.^ comm.)

Figure 8. Graphical representation (a) of the decomposition of a "simple" nonlinear, n-input "algorithm"into a sum of interac ons of various order. (^) The functional representation S(.. x (t..). - + (^) 1 {x,(t))+. ILC (xL(t),x(t) + ... where L is an n-linear mapping, can be read from an akropriate sequence of such elementary graphs. Fig. 8b shows (^) the graphs that implement the fly's orientation behavior, studied by Reichardt and (^) Poggio. Several findings suggest that they may correspond to separate physiological modules. Characteristic functional and computational properties can be associated (^) to each interaction type. (From Poggio and Reichardt, 1976).

%.11 ..; %_; %

TIT TI

Understanding Complex Information-Processing Systems: A Case Study from Vision, Study notes of Theory of Computation

Related documents

Partial preview of the text

Download Understanding Complex Information-Processing Systems: A Case Study from Vision and more Study notes Theory of Computation in PDF only on Docsity!

FROM COMPUTATIONAL THEORY TO PSYCHOLOGY AND

NEUROPHYSIOLOGY --. a case study from vision

by D. Marr

SUMMARY: The CNS needs to be understood at four nearly independent levels of

I

10 VISUAL INFORMATION PROCESSING

13 VISUAL INFORMATION PROCESSING

Marr

16 VISUAL INFORMATION PROCESSING

c.-

19 VISUAL INFORMATION PROCESSING