Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


Computational linguistics Lectures notes 4 - C. Chesi, Appunti di Linguistica

Computational Linguistics Lecture notes (professor Cristiano Chesi 2018) University of Siena - Master degree "language and Mind - Linguistics and Cognitive Studies" Principles and Parameters X-bar theory PAPPI (Fong) Minimalist Grammar Central Nervous system Artificial Neural Network (ANN) +Lab instructions: T-Learn and simple recurring networks

Tipologia: Appunti

2018/2019

Caricato il 05/08/2019

claudia-ruzza
claudia-ruzza 🇮🇹

4.3

(8)

23 documenti

1 / 22

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
14.12.18
ADVANCED PARSING: FROM RULES, TO P&P… AND MINIMALISM
Essential references
●Stabler E. (1997) Derivational minimalism. in Retoré, ed. Logical Aspects of Computational. Lin
guistics. Springer
Extended references
●Chesi C.(2015) On directionality of phrase structure building.Journal of Psycholinguistic Research
●Fong S.(1991) Computational Properties of Principle_Based Grammatical Theories.tesiPh.D. MIT
Index
Principle and Parameters Parsers
Minimalist Grammars
Phase‐based Minimalist Grammar
Parsing with PMGs
From Rules to Principles and Parameters
Rules Principle & Parameters (P&P)
Language specific linguistic universals + parameters settings
P&P aims at a better explicative adequacy (other than descriptive)
Goal: linguistic universals capture the limited syntactic variability across languages
Principle_based parsers(Barton 1984, Berwick e Fong 90, Stabler 92) are inspired by this intuition:
●Grammatical principles are parser axioms
The parser operates as a deductive system inferring grammatical structures applying
the axioms to the input
Advanced parsing
Hierarchical properties will allow us to predict certain problems, i.e. where the subject should be put.
The idea of using PP in a parsing process is similar to a deductive system. Apply rule by rule to derive
a sentence. Problematic in a logic point of view because principles are so general that we have to
compile them to derive a grammar.
.
Few principles
+
Few parameters
=
Thousands of rules
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Anteprima parziale del testo

Scarica Computational linguistics Lectures notes 4 - C. Chesi e più Appunti in PDF di Linguistica solo su Docsity!

ADVANCED PARSING: FROM RULES, TO P&P… AND MINIMALISM

Essential references ●Stabler E. (1997) Derivational minimalism. in Retoré, ed. Logical Aspects of Computational. Lin guistics. Springer Extended references ●Chesi C.(2015) On directionality of phrase structure building.Journal of Psycholinguistic Research ●Fong S.(1991) Computational Properties of Principle_Based Grammatical Theories.tesiPh.D. MIT Index

  • Principle and Parameters Parsers
  • Minimalist Grammars
  • Phase‐based Minimalist Grammar
  • Parsing with PMGs From Rules to Principles and Parameters Rules Principle & Parameters (P&P) Language specific linguistic universals + parameters settings P&P aims at a better explicative adequacy (other than descriptive) Goal: linguistic universals capture the limited syntactic variability across languages Principle_based parsers(Barton 1984, Berwick e Fong 90, Stabler 92) are inspired by this intuition: ●Grammatical principles are parser axioms ●The parser operates as a deductive system inferring grammatical structures applying the axioms to the input Advanced parsing Hierarchical properties will allow us to predict certain problems, i.e. where the subject should be put. The idea of using PP in a parsing process is similar to a deductive system. Apply rule by rule to derive a sentence. Problematic in a logic point of view because principles are so general that we have to compile them to derive a grammar. . Few principles

Few parameters

Thousands of rules

Parsing transformation and we create a focalized sentence. A bunched of rules will deal with any possible relation with a basic declarative sentence and any variation of this sentence. Few principles and by the combination and relation among them we can generate variation of rules. ➔ The previous approach Slobbing, i.e. the derivation theory of complexity (DTC). P&P T model We have a surface structure SS, the sentence we pronunciate, that can be seen from two different point of view: the phonetical part (Phonetic form PF) and the meaning (Logic Form LF) this sentence has. Searching deeper, we find the Deep Structure DS, where we find the lexicon. XI^ theory → θ‐criterion every argument must receive one and only one thematic () role (and every thematic role is assigned to just one argument) → Case filter every lexical NP must receive case (P e Vfiniteare case assigners) Generators = principles producing more structures than the ones in input: ●Move α ●Free indexation ●... Filters = principles selecting fewer structures than the ones received as input: ●X' theory ●θ‐criterion ●Case filter X-bar theory is a theory of syntactic category formation. It embodies two independent claims: one, that phrases may contain intermediate constituents projected from a head X; and two, that this system of projected constituency may be common to more than one category (e.g., N, V, A, P, etc.). The letter X is used to signify an arbitrary lexical category (part of speech); when analyzing a specific utterance, specific categories are assigned. Thus, the X may become an N for noun, a V for verb, an A for adjective, or a P for preposition.

PAPPI Control Strategies

  • Analysis‐by‐synthesis produces everything, then test the input (parallel top‐down approach)
  • Generate and Test one principle at time, generate structures, then test the input (serial top‐down, depth first)
  • Coroutining, freezing, clause selection, ordering re‐organize the problem space, trying to minimize the exploration needs
  • Covering Off_line grammar compile; groups of states collapsing whenever possible, reducing time co mplexity (number of steps) Minimalist Grammars

PMGs SBO: Merge Right (Phillips 1996) Proposal: Phase‐based MGs PMG: Sample derivation of wh‐question Structure building operations: (default) Expand(Lex: CPwh= (+wh +T +S V) ) Insert(Lex: (+wh +D N what)) Insert(Lex: (+T did)) Insert(Lex: (+S +D N you)) Insert(Lex: (V =DP =DP see)) Expand((V=DP)) Move((+D N you)) Expand((V=DP)) Move((+D N what))

Parsing States: CFG‐Earley Vs. PMG‐pa Getting asymmetries with PMG‐pa

The input and output are symbolic, everything else is unsymbolic. ●No explicit structure or structure building operations; ●System complexity (and its apparent representations) is an emergent property simple interaction a mong parts Which problems? Problems that can hardly be decomposed in sub‐problems: ●Problems complex to be described ●Partial representation of problem space Problem of space: tic-tac-toe feasible but more complex games present a problem of space in representing all the steps. Most people rely on machine learning. ●Complex algorithmic solutions that ask for approximations ●Rules and/or heuristics hard to be defined ●High degree on interaction among level (multiple constraints) It is very hard to drive all the constraints in a linear way. Competence and Grammar : two definitions Symbolic (e.g. phrase structure grammar) static set of rules and/or principles (explicit representation of the competence) Sub‐symbolic (e.g. neural networks) Grammar is a processing device (implicit representation) reacting to contextual input; words are «o perators» smoothly moving the system from state to state It simply creates a processing device. (more procedural than contextually grammar) Symbolic perspective of the conjunction in logic. Approaching it from a sub symbolic point of view: representing it in a cartesian way, it is plenty of solutions to draw that. (the line can shift up and down from the point)

The central nervous system The central nervous system (CNS) is the part of the nervous system consisting of the brain and spinal cord. A neuron, also known as a neurone (British spelling) and nerve cell, is an electrically excitable cell that receives, processes, and transmits information through electrical and chemical signals. These signals between neurons occur via specialized connections called synapses. Neurons can connect to each other to form neural pathways, and neural circuits. Neurons are the primary components of the central nervous system, which includes the brain and spinal cord, and of the peripheral nervous system, which comprises the autonomic nervous system and the somatic nervous system. A typical neuron consists of a cell body (soma), dendrites, and an axon. All neurons are electrically excitable, due to maintenance of voltage gradients across their membranes by means of metabolically driven ion pumps, which combine with ion channels embedded in the membrane to generate intracellular-versus-extracellular concentration differences of ions such as sodium, potassium, chloride, and calcium. Changes in the cross-membrane voltage can alter the function of voltage-dependent ion channels. If the voltage changes by a large enough amount, an all-or-none electrochemical pulse called an action potential is generated and this change in cross-membrane potential travels rapidly along the cell's axon, and activates synaptic connections with other cells when it arrives.

Artificial Neural Network (ANN) Architecture An artificial neural network is an interconnected group of nodes, similar to the vast network of neurons in a brain. Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one artificial neuron to the input of another. Classic ANN Artificial neural networks (ANN) or connectionist systems are computing systems vaguely inspired by the biological neural networks that constitute animal brains. The neural network itself is not an algorithm, but rather a framework for many different machine learning algorithms to work together and process complex data inputs. Such systems "learn" to perform tasks by considering examples, generally without being programmed with any task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as "cat" or "no cat" and using the results to identify cats in other images. They do this without any prior knowledge about cats, for example, that they have fur, tails, whiskers and cat-like faces. Instead, they automatically generate identifying characteristics from the learning material that they process. An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it. The simplest we can have is the Percepton. Pattern association: we have the same number of input and the same one of output. A self-organizing map (SOM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional. A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of, at least, three layers of nodes: an input layer, a hidden layer and an output layer.

Difference between CNN and RNN are as follows: Recurrent neural Network

  • RNN can handle arbitrary input/output lengths.
  • RNN unlike feedforward neural networks - can use their internal memory to process arbitrary sequences of inputs.
  • Recurrent neural networks use time-series information. i.e. what I spoke last will impact what I will speak next.
  • RNNs are ideal for text and speech analysis. Convoluted Neural Network (Convoluted: the one used in facebook to recognize the face.)
  • CNN takes a fixed size inputs and generates fixed-size outputs.
  • CNN is a type of feed-forward artificial neural network - are variations of multilayer perceptrons which are designed to use minimal amounts of preprocessing.
  • CNNs use connectivity pattern between its neurons and is inspired by the organization of the animal visual cortex, whose individual neurons are arranged in such a way that they respond to overlapping regions tiling the visual field.
  • CNNs are ideal for images and video processing.

How to select the best architecture? Heuristics are the best hint (Rumelhart& McLelland1982) From a purely formal point of view, 3 layers can solve any problems (Hornik, Stinhcombe& White 1989), but how many neurons and which connection pattern? Coding input (and output) Information bits > number of input tokens (distributed coding) e.g. using a binary coding, we can use 2 bits, for representing 4 elements (a, b, c, d) that is, 2 input neurons (a=00, b=01, c=10, d=11) No similarity among inputs, orthogonal input (localist coding) 1 word = 1 node e.g. 4 input units: a (0001) b (0010) c (0100). (1000) Learning linguistic properties Past tense (Rumelhart& McClelland, 86) Clear linguistic pattern:

  • phase 1: few high frequency verbs (children leaned few crystallized forms)
  • phase 2: over_regularity (break > breaked)
  • phase 3: irregular verb inflection reconsidered (smooth coexistence of irregular and over- regular forms… until only correct regular forma are used) Human‐like performance (some errors still present) Phonetic input coding (Wickel‐features, Wickelgren69) Network architecture (460 input and output units using wickel‐features

Results: (after 420 turns of 200 verbs presentations = 84.000 examples) Time in ANN

  • Epochs (sweeps): (discrete) temporal units corresponding to one input processing
  • Atemporal processing: activation only depends on input, connections and weights
  • Temporal flow simulation trick:input divided in groups; each group a distinct temporal interval
  • Context layer (Elman 1990):hidden layer activation is copied on a context layer that will be added to the next input activation Simple Recurrent Networks Guessing next word paradigm e.g. the house is red Input = the Output = house («unsupervised» learning: auto‐supervised) Psycho/neurological plausibility (Cole & Robbins 92) A sort of priming Input structure localist(1 node = 1 concept) An SRN or Simple Recurrent Network (or Elman Network) is a kind of recurrent network. They are useful for discovering patterns in temporally extended data. They are essentially variants on a backprop network which is trained to associate inputs together with a memory of the last hidden layer state with output states. In this way, for example, the network can predict what item will occur next in a stream of patterns.

Less is less (Rohde & Plaut01) It shows that “starting small” is useless: SRN tend to learn first simpler (local) relations. Cleeremans& al. (89) show improved SRN learning with intervening semantic correlated (e.g. subj‐verb agreement) (e.g. The dog[that barks]runs away) Test: 5 grammar classes: A.No semantic dependency between principal and embedded sentence B.25% of sentences with a semantic dependency C.50% of sentences with a semantic dependency D.75% of sentences with a semantic dependency E.100% of sentences with a semantic dependency ANN and recursive properties learning ⎯ Connectionist model of human performance (Christiansen & Charter 1999) ⎯ Different from competence idea: e.g. Phrase Structure Grammars including recursive rules

  • (memory) limitations = performance ⎯ Connectionist models include these limitations in a unique competence + performance processing device Summary of recursive properties in language
  • Right‐branching(iteration rather than recursion) Mario vede il gatto che ha mangiato il topo che ha rubato il formaggio....
  • Counting Recursion(anbn) se è vero che se Mario vince il concorso... allora si sposa, allora anch’io ci faccio un pensierino
  • Center embedding (wwR) Il topo [che il gatto[che Mario ... vede] ha mangiato] ha rubato il formaggio
  • Cross‐serial dependencies (ww) Mario, Giovanna, Giuseppe ... sono rispettivamente promosso, bocciata, promosso... ANN and innateness Different kinds of «innateness» (Elman 99) ⎯ representational different connection patterns ‐> different activation states ⎯ architectural
  • Unit‐based (neurons typology, activation function...) •Local architectural constraints (density, local circuits...) •Global architectural constraints (constraint on different areas...) ⎯ timing(innate chronotopy) structural modification due to external factors (neural modified plasticity, different input processed...)