Prepara tus exámenes
Consigue puntos
Orientación Universidad
Vende en Docsity
Docsity AI

Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity

Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium

Orientación Universidad

Vende en Docsity

Docsity AI

Inicia sesión Regístrate

Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity

Busca documentos

Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity

Busca tu universidad

Encuentra los documentos específicos para los exámenes de tu universidad

Video Cursos

Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades

Quiz

Responde a preguntas de exámenes reales y pon a prueba tu preparación

Docsity AINEW

Resume tus documentos, hazles preguntas, conviértelos en quiz y mapas conceptuales

Ver preguntas

Despeja tus dudas leyendo las respuestas a las preguntas que realizaron otros estudiantes como tú

Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium

Compartir documentos

20 Puntos

Por cada documento subido

Responde a las preguntas

5 Puntos

por cada respuesta dada (máx. 1 al día)

Todos los modos para conseguir puntos gratis

Consigue puntos de inmediato

Elige un plan Premium con todos los puntos que necesitas.

Oportunidades de estudio

Elige tu próximo programa de estudio

Ponte en contacto inmediatamente con las mejores universidades del mundo. Busca entre miles de universidades en todo el mundo. Busca entre miles de universidades partner oficiales

Comunidad

Pregúntale a la comunidad

Pide ayuda a la comunidad y resuelve tus dudas de estudio

Ebooks gratuitos

¡Nuestros e-books salva-estudiantes!

Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity

Llibre del professor, Monografías, Ensayos de Ingeniería de Sistemas Audiovisuales

Universitat Pompeu Fabra (UPF)Ingeniería de Sistemas Audiovisuales

Asignatura: Percepció i cognició Audiovisual, Profesor: , Carrera: Enginyeria en Sistemes Audiovisuals, Universidad: UPF

Tipo: Monografías, Ensayos

2014/2015

Subido el 26/03/2015

pau01-1 🇪🇸

14 documentos

1 / 328

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

Descubre Monografías, Ensayos de Ingeniería de Sistemas Audiovisuales Universitat Pompeu Fabra (UPF)

Documentos relacionados

quimica

apunts lógica digital 2n trimestre

(4)

xuleta 3 trimestre

(6)

Apunts Corrent ELECTRICA i llei D'OHM

(2)

Parcial 1 (15%) curso 2015-2016

Els apunts del campus virtual

(4)

(1)

Vista previa parcial del texto

¡Descarga Llibre del professor y más Monografías, Ensayos en PDF de Ingeniería de Sistemas Audiovisuales solo en Docsity!

INTRODUCTION TO THE THEORY OF NEURAL COMPUTATION Jobm Hertz “Anders Krogh Richard G. Palmer ALECTURE NOTES VOLUME IN THE SANTA FE INSTITUTE STUDIES IN THE SCIENCES DE COMPLEXITY INTRODUCTION TO. THE THEORY OF NEURAL COMPUTATION John Hertz N ORDE TA Anders Krogh Nia di Bal institute Richard G. Palmer Dukclna vesitand the Sant ke Lustititte Lecture Notes Volume 1 SANTA FE INSTITUTE: STUDIES IN THE SCIENCES OF COMPLEXITY [o yO 92 Lo e p0 Fo Addison: «Wesley Publishing Company Di Advanced Book Program : Redwood City, California» Menlo Par k,Gal formas Reading, Massachusetts New York + Don Mills; Ontario + We okingh am, United Kingdom» Amsterdam Bono Sydney Sing sapo erTakyo + Madrid * San Juan Contents Series Foreword by L. M. Simmons, Jr. xi Foreword by Jack Cowan xv Foreword by: Christof Koch xvii Proface xix ONE Introduction 1 1.1. Inspiration frora Neuroscience 2 12 History 6 13 The Issues Ñ 38 TWO The Hopfield Model . 31 2.1 The Asscciative Memory Problem : 11 2.2 The Model 13 2.3 Stalistical Mechanics of Magnetic Systerás 25 2.4 Stochastic Networks 32 25 Capacity of the Stochastic Network A 35 THREE — Extensions of the Hopfield Model : 43 31 Varlations ón the Hopñeld Model ] : 43 3.2 Correlated Patterns j 49 33 — Cortinious-Valued Units 33 34 Hardware Implementations 58 35 Temporal Sequences of Patterns 63 FOUR Optimization Problems 71 4:11 'Lhe Weiglited Matching Problem Y 4.2 The Travelling Salesman Problem T6 43 Grapk Bipartitioning 79 44 Optimization Problems in. Image Processing 31 Contents FIVE cn 0 DO Ez] E ah om 1 an Cr O 0 pl ose mn A AD 65 6.6 SEVEN EIGHT 8.2 8.3 8.4 NINE 9.1 9.2 93 9.4 95 96 9.7 TEN 10.1 10.2 Simple Perceptrons Foed-Forwatd Networks Threskold Units Proofof Convergenes of the Perceptron Learning Rule Linear Units Nonlintar Units Stochastic Units Capacity of the Sunple Perecpiros Multi-Layer Networks Back-Propagation Variations.on Back-Propagation lxamples and Applications Performance of Multi-Layer Fecd-Forward Networks A Theoretical Framework for Gencralizabion Optimal Network Archilectures Recurrent Networks Baltzmana Machines Recurrent Back-Propagation Learning Time Sequences Reinforcement Learmag Unsupervised Hebbían Learning Unsupervised Learning One Linear Unit Principal Component Analysis Self Organizing Feature Extraction Unsupervised Competitive Learning Simple Compelibive Learning Examples and Applications of Competitivo Learning Adaplive Resonance "Theory Feature Mapping Theory of Feature Mapping The Travelling Salesman Problem lybrid Learning Schemes Formal Statistical Mechanics of Neural Networks The Hoplicid Model Gardner Theory ol the Connections 89 90 92 100 102 107 10 1 115 115 120 130 141 147 156 163 163 172 176 188 197 107 199 204 210 217 218 223 228 232 249 244 246 251 251 265 Contents APPENDIX Ad A2 Ad Statistical Mechanics The Boltzmann-Gibbs Distribution Free Energy and Entropy Stochastic Dynamics Bibhiography Subject Index Atthor Index xi 275 275 27 279 281 307 321 Xiv Introduction to the Theory of Neural Compulation extend modeling techniques to incorporate realistic detailed models of human be- havior. Thus, this Series is intended to range broadly across many ficlds of intel- lcclual endeavor incorporaling work in all the areas listed above. The appatently disparate topics, however, share common themes that rélete them to the emergent sciences ol complexity, “The Santa Fo Tustilute, and hence this Series, would not exist without the sup- port of Tarsiglted individuals in government funding agencios and private lounda- tions who have recognized the promise of the new approaches Lo complex systems research being fostercd here. Tt is a pleasure to acknowledge the broad research grants reccivod by the Iastitute from the Department of Energy, the Jóhn D. and Cátherine T. MacArthur Foundation, aud the National Sciónee Foundátión that together wit: numerous other grante, have mude possible the wotk of the Tostitute. L. M. Simmons, Jr. Santa Fe, New Mexico October 1, 1990 Foreword The past decade has seen.an explosivo growth in studies of neural networks. la part this was Lhe result of teclmological advances in personal and main-frame computing, eñabling neutal network investigators do sirmulate and testideas in ways not readily available befóre:1980.- Another major impulse was provided by Hopfieids work on acural networks wi symmetric conmectións, Suck nobworks had previously: been “disnilssed ás not brain-like and thereloré not worth studying, I myself fell into this trap some tiventy-five years ago when 1 formulated what áre-now termed the stan- dard equations for studying neural networks, those using-the:so-called squashing or Jogistic function. 1 was to HopficlWs credit that he “stopped back: from biolog- ical reality”-as Toulouse has put it, and uncovered an interesting set of properties añid uses lor symmctric networks. What followed is an interesting episode in the sociology of science. : HopfielYs papers triggered an explosion, particularly in tlie statistical physics community, leading to a whole series of dfámatic advances in the understanding ol symmetric networks and their properties, especially in respect of thicir utility as distributed memory stores, and as solvers.of constraimed optimization problems, eg, small versions of the famous Traveling Salesman Problem. At: more-or-less the same time, other developménts in ncural networks, possi- blely. even rúore iifportant, were baking place, colminating in the publication by Rumelhart, linton, and Williams of the now well-knowa “Back-Propágation Algo- athim” for solvingtho fundamental problem of training neural networks to compute desired functions, a problem first formulated by Roserblatt in bhe late 1950's'Ín his now-classicad work ón Perceptrons. Again this paper triggered a massive explosion of work on trainable neural networks which continues to £his day. Ñ Tire authors ol this book; Palmer, Krogh, and Horta; are statistical physiciets who have experiented: those developments: They: have sought to provide an intro- duction to the-theory behind all the hoopla, and 46 summarize the current state xiii Introduction to tha Theory of Neural Computation This monograph succinctly captures these trends and summarizes the current state: of the art-by way of highlighting the analogies Lo statistical mechanies and electric circuit theory as well as by discussing. various practical applicalions.-1-3s done wilhout overdue emphasis on-a formal malhematical treabment, appesling rather to the intuition af the reader. Throughout the book, ¿he emphasis. is ón those featuros of neural networks relovañi to information processing, storage and recall, that is to computation and function, linking physics to computing máchines. Tke Computation and Neural Systems Series —Over the past:600 million years, biology hos solved ble problem «f processing massive arcounts of nóisy and lúglly redundant information in a constantiy changing environment by evolving networks of billions of highly interconnetted merve célls. 14 is the task of scientists be they malbematicians, physicists, biologista, psychologists, or computer scientists-—to un- derstand the principles undorlying information protéssing in these complex strue- tures. Al the same time, rescarchors la machine vision, pábler recognition, speech uuderstanding, robotics, and other areás of artificial intelligence can profit from understanding features of existing norvous systems. Thus, a new field is exmerg- ing: thé study of how computations can be cárricd-out in extensive networks of heavily intercomnected processing elements, whether these networks are carbon- or silicombased. Addison-Wesley!s new “Computation aud Neural Systems” series will rellect the diversity of this field with textbooks, course materials, and mnono- graphs on topics ranging from the biophysical modeling of dendrites aud neurons, to computational theories of vision and motor control, to the implementation of neural networks using VLST or opties technology, to lhe study of highly paralel computational architectures. Christo Koch Pasadena, California September 21, 1990 Preface We generally like our titles shorter tha an latroduction to the Theory of Neural Computation, but:all those words are important in understanding our purpose: Neural Computation Our subject matter le computatión by artificial neural networks, The adjertive “neural” Ts used because much of the iaspiration” for such networks .cómes forn neuroscience, nol because we are concerned with networks:of real neufons. Brain modeling is a differeut field and, though we sometimes describe biol logical analogiés, our prime concern ls with what: the artificial networks can do, aud why. 1h is arguable thai neural” should be purged from the vocabulary of this fñeld-—porhaps Network Computátion would have. been more accurate in our title-but at present it ls Iirmly ensconced. We do however avoid:most other biological terms in non-biological contexts, including *neuron” (una) and “synapse” (connection). Theory We'embplrasizo the theoretical aspects of neural computation: Thuswe provide little or no-coverage of applications in engineering or computer science; implementations in hardware or software; or implications for cognitive science úr artificial intelligence. There are recent books on all these topics and we prefer to complement rather than tocompete. On the other hand, we feel that even ihose whose interestin the subject is.completely practical may benefit from a broad theoretical perspective. We are na doubt biased by the Tact that:we are theorists by trade, bob. ncóur own-experience we found this background to be éssential in úsing neural networks for practical applications (not desert bed in his book). xd Proface aud cognibive sciences as well as computer science, engineering, and physics. Later versioús were used as the basis of summer school lectures at Santa Fe in fine 1988 [Palmer, 1989], and Íor a one-semester course for physics and computer science students af tbe University of Copenhagen ia the fall of 1989. We thank all of the students in all of these courses for constrnctive feedback that led Lo successive improvements. We owe a debt of gratitude to mary of our colleagues in Durham, Copenlagen, and Santa Fe who encouraged, supported and helped-us in this work. Theseinelude Ajay, Alan, Benny, Corinna, Dave, Ingi, David, Frank, Gevene, Jack, John, Jun, Kurb, Lars, Marjorie, Mike, Per, Ronda, Seren, Sta, Termas, and Xiang. Two of us (JH and AR) also thank the Physics Depártment at Duke for their hospitality 3n the spring of 1988, when this whole enterprise gob started. AK thanks the Carlsberg Foundation for generous financial suppori. Finally, we reserve our deepest appreciation for our wives and families. Ti is a hackneycd theme to thank loved ones for patience and understanding while a book was being written; but now wé know why, and do give heartiell 1hanks. Richard Palmer Anders Krogh John Hertz Durham and Copenhagen, August 1990 iectronic mail addresses for the authors: UNE Introduction Anyone can see that the human brain is superior to a digital computer at many tasks. Á good examples the processing of visual information: a:one-year-old baby is much: belter and faster at recognising objects, faces, and so on than even the most ádvanced Al system tunñimg on Che lastest supercornputer. The brain has many other features” that wóuld be desirable in artificial systems. 8. Itisrobust and fault tolerant, Nerve cells in the brain die every day without aflectiig ls performance significantly. = Itis Bexible.- J6' can easily adjustito a new enviromment by “learning” —4 does not have to be programmed in Pascal, Foftrán orC. 8 Tt can deal with informátión bat is fuzzy, probabilistic, noisy, or inconsistehi: a -Itis highly patallel. e Ítis'small, compact, add dissipátes very little power. Only ln tasks based primarily on simple arithmotie does the computer outper- : Sormthe brain! This ls the real molbivabion for studying neural computatión. ltis-an alterna- tivo cómputational paradigm to the usual one (based:on'a prógramned instruction sequence), which was introduced by von Neumann and has been used as the basis of almost: all machine comiputatión to date. 1t is inspired by knowledge from nea- “roscienee, though it does not try to be-biologically realistic in detail. 1: draws its -Faethods in large degree from statistical physics, and that is why the lectúres on which:¿his book is based originally formed part ofa physiés course, Hs potential applications lie of course mainly dn computer science and engineering. lo addition 14. may be of value as a modeling paradigm in neuroscience and in sensory and cognitive psyclrology. 2 ONE Introduction pl A synapse axón nucleus cell body A —— dendrites FIGURE 1.1 Schematic drawing ol a typical neuron. The field is also known as neural nebworks, neurocomputation, associabive net- works, collective computalion, connectionisi, and probably many other things. We will use all these terms freoly. 1.1 Inspiration from Neuroscience Today's research in neural computation is largely molivated by the possibility of making artificial computing networks. Yet, as the Lerm neural network” imples, 14 was originally aimed more towards modelling networks of real neurons im the-braln. The models are extremely simplified when seen from a neurophysiological' point of view, though we belicve that they are still valuable for gaining insight into the principles of biological “computation.” Just as most of the details of the separate parts of a large ship are unimportant in understanding the behavior of the ship (e.£., that it foats, or transporis cargo), so many details of single nerve cells may be unimportani in understanding the collective behavior of a network of cells, Neurons The brain is composed of about 101) neurons (nerve cells) of many dillerent types. Figure 1.1 is a schematic drawing of a single neuron. Tree-like networks of nerve fiber called dendrites are connected to the cell body or soma, where the cell nucleus is located. Extending from the cell body is a single long fiber called the axon, which eventually hranches or arborizes into strands and substrands. ÁL the ends of these are the trausmilbing ende of the synaptic junctiobs, ON Synapses, to other neurons. The receiving ends ol bese juectións oncother cells can Be found 1,1 Jaspiration from Neuroscience 3 FIGURE 1.2 Sehematic diagram of a McCulloch-Pitts neuron. The unit fires df the weighted sura E nj Of the inputs reaches or exceeds the threshold pz. IPN] oy both on the deudrites and on the:cell bodiés themselves, The axon of a lypical néuron makes.a few thousand syhapses with -obher menrons. The transmission of a signal from one:cell to another ata synapse isa complex chemical process in which specific transmitter: substances are released from the sending side of Ulicjunction. The effect is to raise or lower the electrical potential inside the body of the receiving cell 1 this potential reaches a threshold, a pulse or action potential of fixed strengih and duration is sent down the áxon. We then say that the cell has “fred”. The pulse branches out through the axonal arborication Lo synaptic junctions:to other cells. After firing, the cell has Lo walt for a time called the rofractory period belore it can fire again. McCulloch and Pitts [1943] proposed a simple model:of a neuronas a bináry thresbold mit. Specifically, the model neuron compútes a weighted sum of itsinputs from-other-anits, and outputs a Oñe or á zero according to whether this suñvis abóve or below a certain threshold: : (4D) =00 Juno — 4). (1) i See Fig. 1.2. Here n, is-cither 1 or 0, and represents the state of neurea ¿as firing or not firang respectively. Time £ is taken as distrele, wi4h one time unit elapsing per processing step. O(2) is the unit step function, or Heaviside function: O E E . ot) = to otherwise; : (122) The weight t0¡¿ represents the strength of the synapse connecióng neuron ¿to neuron it can be positive or negative correspoftidiig to an excitatory or inhibitory syiupse respectively. ll is zero if there is: no:synapse between ivand j. The cell specific parameter y; is the threskold value for nia; de wcighted sum of inputs must.reach or exceed the threshold for the neuron to-fire, Though simple, a McCalloch-Pitis neuronas computationally a powerful device. McCulloch and Pitts:proved tlitat a synchroñous assembly of such neurons is capable in principle of universal computatión for suitably chosen weighte 1057 Vhis meáns thal di cán perform any computation that:an ordinary digital computer can, though nót necessarily so rapidly or conveniently. : Real neurons involve mány complications onutled Tror this simple description. "The most'signficant ones include: : o ONt Introduction FIGURE 1.3 A two-layer percepbron. system which could operate in parallel like this but with switching times of current semiconductor devices! 1.2 History The history of these sorts of ideas in psychology otiginates with Aristotle, Yet as a basis for computational or neural modelling wc cán trace them to the paper of McCulloch-=ud Pitts [1943], which introduced bhe model described above. During the next fifteen years ihere was considerable work on the detailed logic of threshold notworks. They were realized Lo be:vapable of universal computation and were analyzed as finite-state machines; see Minsky [1967]. The problera of making a reliable network with unrcliable parts was solved Ly the use of redundancy [von Neumana, 1956], leading later to distributed redundant representations [Winograd and Cowan, 1963). : At the opposite extreme to detailed Jogie, continuuím theories were also de- veloped, Known as neurodynamics or neural field theory, this approach used diflcrential equations to describe activity palterns i1'bulk neutal mátter [Rashevsky, 1938; Wiener, 1948; Beurle, 1956; Wilson and Cowan, 1973; Amari, 1977). Around 1960 there was a wave of activity centered around the group of Frank Rosenblabt, focusing on the problem: of how to find appropriate welghts 2, fór par- ticular computational tasks. They concentrated on wetworks called percep trons, in which the units were organized into layers with fecd-forward connections between one layer and the next. Án example is shown in Fig. 1.3. Very similar networks called adalines were invented around the same time by Widrow and Mel (1900; Widrow, 1962]. For tre simplest class of perecptrons without any intermediate layers, Rosen- blatt [1002] was able to prove the convergenee of a loarning algoritlma, a way to change the welghts Jteratively so that a desired computation was perlormed. WManv 12 History 7 people expressed a great deal of enthusiasm and hope that such machines could be a basis for artificial intelligence. There was however a catch to the learning theorem, forcefully pointed out by Minsky and Papert [1969] in their book Perceptrons: the “theórérm obyiously applies only to those problems which the strnebure is capable of computing. Minsky and Paper showed that some rather elementary computations could - no? be done by Rosenblatiós one-layer? perceptron: The simplest example is the exclusive or (OR) problem: a single output unit is required Lo turn on (1 =-+1) F'one or the other of two input lines is ón, but not when-nelther or both inputs-are-on. Rosenblati had also studied structures with more layers of:units and believed that they could overcome thio limitátions of the simple perceptrons. However, tlicre was no learning algorithm lenown which could determine the weighis iiecessary bo implement a given caleulation. Minsky and Papert doubted that ové could be found ánd thought it more profitable tá explore other appróaches to artificial intelligence. With this most of the computerscience community left the nevral network paradigm for almost 20 years. Still, therc-were a number of pcople who -continued to devélop neural net- work theory in the:1970's. A major theme was associative content-addressable mcomory, in wbich different input patterns become associated with one another (le., trigger the same response) il sufficiently similar. These had actually. been proposed much earlier [Taylor, 1956; Steiabuch, 1961], and were later revived:or rediscovered by Anderson [1968, 1970; Anderson and Mozer, 1981), Wilshaw:et:al. [1969]; Marr [1969, 1971] and Kohonen [1974-1989]. Gróssberg [1967-1987] made a comprébensive reformulation of the general problem of learning in networks. Marr. (1969,-1970, 1971) devéloped network Lheories ol Lhé-cerebellam, cerebral neocortex, añd hippocempus, assigning specific functions to each type of neon. Á “number of people, including Marr [1982], von der Malsburg [1973], and Cooper [1973; Nass aud Cooper, 1975), studied the developinent and functioning ol the visual system. Another titwead:of development can be traced to Gragg and “Temperley (1 901, 1955]. They reformulated the MeCulloch-Pitts network ás 2 spin (magnetic) system of the sort famillarin physics. Memory was believed to reside in the Iysteresis of the domain patterms expected for such a system. Calanicllo [1961] then constructed a statistical théory, using idéas Tróm statistical mechanics, and incórporated learning la a way which dréw.on the ideas of Hebb [1949] about learning in the brain. The same theme was taken up in:the/1970's by Little [1974; Little and Shaw, 1975, 1978] and again in 1981 by Hopficla [1982]. Hopfield was able to add some helpful physical insigl by introducing an energy function, and by emphasizing the notion of memories as dynamtically stable áltraciors, Hinton and Sejnowski (1083, 1986] and Porcito [1984] constructed Tormulatións using stochastic units wlich follow the dynamics (1.10: (1:3) only approximately, making “mistakes” with a certain probability aiwalogous to temperaturo in statistical mechanics. The real power-of We never coimt input lines ás tañits in numberiag layers. Figure 1.35 Phu a ¿iwo-layer network. Until record ivH wondLollor have been cálida: ihrredayer notworkiburthe conventión de chnavre ina ONE introduchon statistical mechanics was then brought to bear on the stochastic network problem by Ámit ot al. [1985a, b; Amit, 1989], using methods developed in the theory of randora magnebie systems called spin glassos. Perhaps the most iuflucntial development in this decade, however, takes up the old thread of Rosenblattós percepkrons where 16 was cut 20 years ago. Various people have developed an algorithm which works quite well for adjusting the weighbs connecting units in succossivo layers of multblayer perceptrons. Known as back- propagatión, it appears to have been found first by Werbos [1974] in the mid- 70's, and then independently rediscovered around 1985 by Rumelhart, Hinton, and Williams [1986a, b], and by Parker [1985], Le Cun [1985] also proposed a related algoribhun. Though nol yet the holy grail of a completely general algorithm able Lo teach an arbitrary computational task to a network, 1£ can solve many problems (such as XOR) which the simple one-layer perceptrons could not. Much current activity is centered on back-propagation and ile extensions. Many of the important carly papers háve been collected in Anderson and Rosen- feld (1988), includiag many of those mentioned here. This is an excellent collection Íor those interested in the history of nevral networks. We also recommend the review article by Cowan and Sharp [1988a, b], which we drew on for this section. 1.3 The issues Massive parallelism in computational networks is extremely attractive in principle. But in practice there are many issues to be decided hefore a successful implemen- tabion can be achieved for a given problem: = Whatis ihbe best architecture? Should the units be divided into layers, or not? How many connections should be made between units, and how should they be organized? What sort of activation functions g(x) should Le used? What type of updating should be used: synchronous er asyuchronous, deber- ministic or sbochastic? How many units are needed fora given task? e How can a network be programmed? Can it learn.a task or must if be pre- designed? If it can learn a task, how many examples are necdod for good per- formauce? How 1many tines musi ib go through Lhe examples? Does 11 need the right answecs during training, or can it learn from correct/incorrect rein- forcement? Can it learn in real-time while functioning, or must:the braming phase he scparated from the porformance pliase? s= Whal can the various iypes ol network do? How many different tasks can they learn? How well? How fast? How robust are they to nussing Information, incorrect data, and uni removal or malíunetion? Can they generalize from known tasks or examples to unknown ones? What classes of input-Lo-output functions can they represent? Bal AA 5d a How can a network be bull in hardwarc? What are the advantages and dis- advantages of diflerent hardware implementations, and how do they compare to simulation in software? These questions are obviously conpled and cannot be answercd independently. The architecture, for instance, strongly influences what the network cau de, and what hardware options ace available, Much of this book will be concerned with refining and answering the above questions. However we will generally approach bem from.a£hooretical point of view, rather than [roma désign one: That is, we wiFáttempt to understand the behavior of networks as a function of their architecture, and only rarely raise the question of designing networks to full particular, goals. Dut:of course the two viewpoints are not independent, anda strong understanding of principles ls invaluable for good design. “Three of the issues reised above deserve a little more cormment here, as general background before we become involved in details. Hardware Almost everything in the field of neural computation has been done by simulating the networks on-serial computers, -or by theoretical analysis, Neural network VEST chips aré far behind (he models, asis natural at this point, The mein problern with. making neral network chips is.that one ñeeds'a lol of connec tions, often some fraction of:the square of the nuínber ol unite. The space taken up by the connections is ustally Abe limitióg factor for the size of a network. Thé neural chips made so far contain of the order of 100 units, which is too few for most práctical applications. Potential alternatives to integrated circuit chips include optical computers. The field is very young, but electro-optical and optical associútive memories have already been proposed or built. Efficient hardware is crucially important in the long term if weate going bo take full advantago-of the capabilities of neurel networks, and there. is growing activity in this área, However, il is largely beyond the scope of this book; we return to hardware issues only briefly in Section 3.4. Generalization The reason for much of the excitement about neural networks is their ability to generalize Lo new situations. Alter being brained on a number of exámples of a. relationship, they can often iuduse a complete relationship that interpolates and extrapolates from the examples:in a seusible way. But whab is méaxni by sensible generalization is often not clear. ln many problems there are almost infinitely many possible generalizations: How does a neural nebwork-—or a-bhuman for that mabter— choose the “right” one? As an example one could train a/neúral network on three of ihe four XOR relations mentioned earlier, and il. would-be very unlikely that any of PR TWO The Hopfield Model 23 Tne Asociativo Memory Problem. ciative memory 64 tha "Iruil Ay of tes in about he simpleso posible manñicr ¡he way -Ubat collective: e Epstein can work: The besé problesn ds flo Store axet ofp patierms E insiich ía ay Ehal when presented with a new pafísin dy, tha néiark respon Páliceos pst Hosoly resomibles “he patternó aredabelled by y 1/2, 0, p, wbile Che 0nts ju Ue ebivork dre dabcled by 42 172.00, No Botivdhe patsems Ef ¿a6d th6 tes! patterns €, á loché alther Dor Ea sach site 3, theagh we will adopt a dificrent my. . id of course do this serieliy 1 a convéntional computer Simply by Stéring Bt of the paíterns El, writing a program which computed the Reiming dis Exa aa A hetwoen fte téxl paítern € ánd cach of the dtored piatiézos, Anding whichoól Cher Was sallest and primiting he corresponding stored patin out! >. dois ant ba des hondo pra MEC lod Pitts attecik la do 16 Tha is, 1 We stort in Uh configuration a We vaca kacw what dif any) sél df ys Hamming distante between two binery márber roeaos ahió nundier dl bis eboie año elficons boro sumbera, $ +2 TWO Tha Hopñisld Model FIGURE 2.1 Exaruple of hw an associasive memory can recon struct images. Thesé are hinary s with 130 x 180 pixels. The the rightowere recalled he memery after presentabion of the Torupted images ahown on the left: “Phe must shows sono intermediate sl Ausparsely connected Hophe nistacork with stored images was used. will make tbe network go to the sháte with on; 33 £/9, where 4418 pattern number Ma that is the smallesí distance (2.4) from. Thus ve want che rcmory to be content-addresanble and insensitive to small errors in the input pattern, A content-addressable menvory car be quite powerful. Soppose; forexample, we store coded infermation ahomt mány famous scientists in a network. Then the starting pattern “evolution” should be sufficiont €o rocall everyUbing abirút Darwin, and “E me” should recall Kinstéln, despite the error la the input pattern. Note thal sume pattern will always be retrieved for any eluec(unlesa we invent a “dowt know” patiern); the nateork will never retrieve a near combination say, Darwin añd Wallace in respúnse to “evolution” butt will pick the best roasch according fo witaf has been stored Tlús depends on the nonliiearity of the network, sad ebvicusly has advantages for many practical appUcatións. Other common examples of applications for an assecialive IMEmory are Pecog nition aud reconstenction of imáges (see Fig: 2.1), And retrieval of bibliographic information from partial refareñces (suclivas from an incómplele títle-of a papier). Figure 2.2 shows 'schématically ¿he function “of (he dynamis associative (or content-addressable) memories tha wa cónstraél in this chapter. The space cf all púscible states Of the network-: the configuration space--Ís represented bj the región drawn! Váthin Ahas space: the stored palterita £/ are aftractoys. The dy- nánúes Of (he systám carries steriig púlets Tito one ol Ue rallractors, as shown by the trajectories skétébed. The whole configuration space is thos divided up into 19 TWO The Hopfiald Model sand) 1 Di xXx 2 “1 1] 1 EJ FIGURE 2.3 The function e sento) and the threshold 8; is related to the pas de (1:1) by 8; ve Zur ly to. En 4h rest of ts chapter we drop these tareshold termos, fkcing Úe > (1, because they are not useful with the randorn pasteros thal we will consider. Tlús we use sn) 0,8) es A Thiore aro ná least ho ways in which we might curry cut theapdaling sperificd by (2:9). We could do ¡it synehronoasly, updating-all nnite simultanecusly at each tine step. Or we tonld do 13 asyachroncusig, updating them cne at. a time, Both kinds of médela are interesting, but the asynehronoas choice is more natural for both brains añd artificial network. The syachroucus cnica requires a central clock or pacernaher, amd is potentially sensitive to timiog errors. ln the asynchrcnous case, which we/adópt hencefónil; we can proceed in either sf two ways: y Afrach timestep, selscial raudofiva unit í to be updated, end apply the de (2) : 7 a Leb each unit independently ¿horse to update itself accordiig e (2,4), with some constant probability per unit fime: Thése chúices are equivalent (except for the distribution of update intervals) hécause thé second gives a rundofa sequence: there is vanishingly small probability of two units choosing Lo update al éxantly the same moment. The first choice le appropriate for simulation, with central control, while the second is appropriate for sutonemous hardware umits: : We also have to speciíy for Eco long (for how many updatinga) we will allcse the network to evolve before demanding that its anits” values give the desirad stored pattern, One possibilily há the casé of spichroxons updating is tv'require that the network go to thé gorrort merorized'patlern right away ón the fiestiteratión, Tn the présent discusdicn (using asyicbrenous npdating) we demand only that the ¿núbwoik séttle eventually into árstable configúratiónome for whidUno S, changes añy more : other thañ study a specific problen: sixh as memorizing a partienlar set of plctutes; We examine She mote generis piáblem c£a Tandorm set of patterns deien from a distribution. For eonvenience we will usually cake the palterna to be snade ¿ | 2.2 The Moda! 35 up of independent bits E; which can sach take on the values +1 and —1 with equal probability. More general situafiona ate discussed in Section 3.2. Our procedure lor testing whether a proposed form of my is áeceptable 4s first to see whether the paíterns to be memorized “are themselves stable, and then to cherk whether small devialione from these patterns are correciod as the network evolves. One Pattern To motivale our choice for £he connection weíghts, we consider ÑItst the simple case where tirere ds josé one pattern E; ¿hal we weni to memorize. The condition for this pattern to be stable la just sen wati) =6 Morano (25) El because then the rule (2.4) produces no changes, lis sasy to sse that tus is true if we take 1065 0 Ey (2.8) since £f == 1, For later convenient ¡we táke the constant of proportionality to bo 1/A, where Nis the mumnber ol víito in the network, giving en Furthermors, 16 a ¿also obvióus thal even if a mmber (fewer than half) of the bits of the starting potter: S, ave wróng ÚLo., not cquel to £/), hey will be overwhelmed in the sun for he het input his DS, (28) E the majority that areríght, and sgníAs) will still give €. An initial configuration ucar (in Hamiing distance) to £ 461 thetófore quickly relax bo Ei: TH idcnts ill ib network will cortéct brrors us desired, end te can say that the patter £/45'an attractor: Actually theró aré two atiraciors in this simple case; (hs other bue is at “és. This is colo a-reversed state: All starting 'configurations wifhimore tion half the hita different from the óriginal paté will cod up in the reversed state. The coufiguraticn space is symrnetrically Mivided inbo tro basins df attractión. as shown in Mg 2.4