













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A research paper from the ucla/potsdam working papers in linguistics series, published in may 2003. The paper, edited by mahajan, explores the comparison of three perspectives on head movement in linguistic theory: minimalist grammars (mg), head movement grammars (mgh), and mirror-theory-like grammars (mttg). The author, edward stabler, discusses the properties of these grammars and their implications for understanding human language syntax.
Typology: Papers
1 / 21
This page cannot be seen from the preview
Don't miss anything!














UCLA/Potsdam Working Papers in Linguistics, May 2003 Head Movement and Syntactic Theory-Mahajan (ed.)
Edward Stabler [email protected]
In the attempt to understand the fundamental properties of human language, in virtue of which it can be acquired and used as it is, the most common strategy is to propose a gram- mar G that provides a reasonable account of some particular structures of expressions in particular languages. For example, there are proposals in this volume about verbal complexes in German, about agreement on nouns in Maasai, about A-binding in English, and so on. On the basis of these hypotheses, we can use poverty-of-stimulus arguments, cross-linguistic com- parisons, etc. to support universal claims of the following form:
(U) G is in the (restricted) class of grammars G
More than 5 decades of very active research shows that iden- tifying a restrictive, illuminating, explanatory universal gram- mar G is not a trivial matter! There is little consensus about the structure of verbal complexes in German, nouns in Maasai, or A-binding in English. One practical, uncontroversial obser- vation we can make is this: the objects of study, the particular grammars G, are complex, and our representations of these ob- jects are complex too. Consequently, when the investigation is informal, it can be very difficult to separate the substan- tial, supported empirical claims about G from the consequences of mere notational conventions, and from consequences of as- sumptions that are merely programmatic. Another strategy for getting to the universals (U) involves supporting weaker hypotheses than particular (parts of) gram- mars G. It still rests on hypotheses about properties of par- ticular expressions, of course, but it does not begin with any complete account of them. Rather it identifies more abstract properties that any reasonable grammar should require. In ef- fect, the strategy is to aim for instances of (U) more directly. Instead of attempting to specify any particular grammar G for some particular structures of a particular language, we can try
Stabler – Comparing 3 Perspectives 179
to determine whether the actual grammar G is in some infinite class of grammars. Since the claims here are more abstract, they sometimes require more careful, formal formulations, which may be appropriate in any case given the complexity of most of the prominent proposals in the field. There are many familiar hypotheses of this latter sort, which can be regarded as instances of (U). Some are well established, while others are still being actively studied. We can order our claims according the complexity of the patterns, the regulari- ties, that can be enforced, from simple finite state languages, out to the sets of morpheme sequences that are not even re- cursively enumerable. (Remember that a simple counting argu- ment shows that, by far, most sets of morpheme sequences are not recursively enumerable (RE), so even the hypothesis that hu- man languages are RE is very strong). It is not a surprise that the most interesting and controversial claims about the complexity of human languages (here marked with ?) are in the middle of this spectrum:
(1.1) finite state grammars (FSG) do not provide a good de- scription of the nested dependencies found in human languages: G 6 ∈FSG (Chomsky 1956, et al) (1.2) context free grammars (CFG) do not provide a good de- scription of the crossing dependencies found in human languages: G 6 ∈ CFG (Chomsky 1956, et al)
(?1.3) tree adjoining grammars (TAG) – and the similar, weakly equivalent grammars studied by Joshi, Vijay Shanker and Weir 1991 – do not provide a good description of the dependencies in German scrambling: G 6 ∈ TAG (Rambow 1994, Steedman 2000, Kulick 2000, et al)
(?1.4) the sets of fully grammatical sequences of morphemes in human languages are semilinear (semiLin): G∈ SemiLin (Joshi et al, but cf Michaelis and Kracht 1996) (1.5) the question of whether a sequence of morphemes is a fully grammatical sentence of a human languages is effectively decidable: G∈ REC (Putnam 1961, Matthews 1979, et al) (1.6) the sequences of morphemes is a fully grammatical sen- tence of a human languages form a recursively enumer- able set: G∈ RE
Stabler – Comparing 3 Perspectives 181
The structure building rules will be feature driven, and so it will be convenient to adopt a “bare” representation of syntactic structure:
the
idea
the
the idea
But: <
the:D idea
We adopt the “bare” ordered trees shown at the right, where the leaves have features and internal nodes are marked with an order symbol < or > which “points” toward the head of the complex. An expression with just 1 node is its own “head.” So then each leaf is a head with lexical features, and the maximal projection of a leaf is the largest subtree for which that leaf is the head. It is not difficult to provide fully rigorous definitions of the structure building rules as in (Stabler, 1999), but the operations are quite simple and some examples will suffice to indicate what is intended. The operation merge is a function that applies to two trees, where the head of the first begins with a selection feature =f and the head of the second begins with a category feature f. The features =f and f are checked and deleted, but the structure of the result depends on whether the first tree, the selector, is complex or not – that is, whether it has already filled its complement position or not. When the first tree is a simple lexical item with an unfilled complement position, the second tree is attached on the right, in “complement” position:
kisses::=D =D V + Pierre::D ⇒
kisses:=D V Pierre
When the first tree is not simple, the second tree is attached on the left, in “specifier” position: right, in “complement” position:
182 UCLA/Potsdam Working Papers in Linguistics
kisses:=D V Pierre (^) + Marie::D ⇒
Marie <
kisses:V Pierre
The operation move is unary, applying to a single expression whose head begins with a licensor feature +f, and where ex- actly one other node in the tree begins with -f. In this case, move applies to move the maximal projection of the -f head to the specifier of the +f head, again checking and deleting both features:
will:+case T >
Marie:-case <
kiss Pierre (^) ⇒
Marie <
will:T >
<
kiss Pierre
Since merge, move are invariant, each MG can be given simply by its finite lexicon. The language is the whole set of struc- tures generated from the lexicon by the application of merge and move.
For example, consider the following grammar MG1 with 5 lexical items, inspired by some suggestions of Mahajan (2000).
love::=D V -v -s::=v +v +case T ǫ ::=V +case =D v Romeo:: D -case Juliet:: D -case
With this grammar we have a 7 step derivation of Romeo love -s Juliet (which the reader should be able to calculate by hand), with remnant movement at the penultimate step, when the VP from which the object was extracted moves to the specifier of IP. Here is the derived bare structure, and a more conventional depiction of the same result:
184 UCLA/Potsdam Working Papers in Linguistics
-s::=>V T + love::V ⇒
love -s:T
So for example, consider the following grammar MGH1 with 5 lexical items:
love::=D V -s::=>v +case T ǫ ::=>V +case =D v Romeo::D -case Juliet::D -case
This grammar has an empty affix v which attracts the verb (when it selects the VP and attaches it in complement position), and the complex v head that results is then attracted to the tense affix -s (when the vP is selected and attached in complement position). So it is easy to calculate that in 6 steps we derive the following bare structure:
Romeo <
o>
o>
love
-s:T
Juliet <
<
This bare phrase structure can also be represented in the more conventional form, as shown below:
Stabler – Comparing 3 Perspectives 185
Romeo
v
V
love
v
-s
vP
DP
t(1)
v’
DP(0)
D’
D
Juliet
v’
v
t
t
t(0)
A rather different mechanism accomplishes the affixation in the “mirror theory” proposed by Brody (2000) and Brody (1999). A simple formal, derivational grammar inspired by mirror theory is studied by Kobele (2002). In this grammar, a category can be selected to attach as a right or left daughter. Inflectional elements like -s attach V as a right daughter. In the resulting structure, a sequence of elements related by the complement re- lation is pronounced at highest strong category, and otherwise, left branch are pronounced before the root, and root before the right branch (if any). The lexical elements are lexically specified strong ::: or weak ::. To implement these ideas Kobele uses the following features:
category N,V,A,P,C,D,… select as comp =N,=V,=A,=P,=C,=D,… select as spec N=,V=,A=,P=,C=,D=,… licensor +wh,+case,… licensee -wh,-case,… phonetic+semantic Romeo,Juliet,love,…
We have two two structure building operations, only slightly dif- ferent from the more familiar ones above. There are again two cases of merge. Complement merge again checks and deletes the relevant features:
Stabler – Comparing 3 Perspectives 187
closely comparable complexities). Writing CSG for the class of languages generated by context sensitive grammars, MCTAG for languages generated by (set-local) multiple component tree ad- joining grammars, and LCFRS for languages generated by linear context free rewrite systems, we have the following relations among languages:
These results were established in a long tradition of work in- cluding, among others, (Harkema, 2001a; Michaelis, 2001b; Ko- bele, 2002; Michaelis, 1998; Weir, 1988; Seki et al., 1991; Vijay- Shanker and Weir, 1994). Recognition: For the decidability of the languages defined by grammars near the mainstream of linguistic research, there has been a long line of discouraging complexity results, but the languages defined by MG, MGH and MTTG are all efficiently decidable:
− TG undecidable (Peters and Ritchie, 1973) − LFG intractable (Berwick, 1981) − UGs (hence HPSG,LFG) undecidable (Johnson, 1988; Torenvliet and Trautwein, 1995) − GB intractable (Ristad, 1993)
The line of results involving MG, MGH and MTTG was estab- lished in the papers mentioned at the end of the expressive- ness section just above.
Learnability: There are various formulations of the learning problem. In the tradition established by Gold’s work, we have the following results for “identification of languages in the limit from positive text”, where we say a grammar is k-valued iff it assigns at most k different syntactic categories to each lexical item:
188 UCLA/Potsdam Working Papers in Linguistics
− TG, UGs (hence HPSG,LFG), GB, MG, MGH, MTTG , TAG, rCCG,…are not learnable from strings, even with learn- ers capable of evaluating non-computable functions (Gold, 1967)
k-valued CGs ⊂ CFGs are learnable from strings (Kanazawa, 1996)
k-reversible regular languages are learnable from strings (Angluin, 1982)
finite sets learnable from strings (Gold, 1967)
We conjecture that k-valued MGs, MGHs and MTTGs are also learnable from strings (Kobele et al., 2002), but the proof has not yet been presented Stabler (2002). In any case, no differ- ences in the learnability of the three kinds of grammars consid- ered here, MGs, MGHs and MTTGs, have been discovered.
The very brief summary of formal research above does not reveal any significant differences among MG, MGH, MTTG. What kind of differences should we look for?
a. carefully exploring the details of particular constructions, we may find something that is appropriately handled only by (some elaboration of) one of these. This is clearly an important strategy, but a result of this kind would be surprising because all 3 options very expres- sive. b. expressively equivalent formalisms can differ in their ac- quisition complexity, so maybe only one of these can pro- vide a reasonable acquisition theory
c. expressively equivalent formalisms can differ in the suc- cinctness of their grammars for particular languages, and in the succinctness of their encodings of strings of those languages, so maybe one of these provides the “simplest” theory
Since appeals to the relative simplicity of one grammar over another are common in linguistic argumentation, let’s briefly consider c.
190 UCLA/Potsdam Working Papers in Linguistics
the selection of the names. Is there any language L such that the smallest grammars for L in each of these frameworks dif- fer significantly? We conjecture that the answer is no, but the question remains open.
We have see that with regard to expressive power,
MG≡MGH≡MTTG.
Consequently, data of the form
S, or *S,
for any expressions S, can never, by itself, be the basis for decid- ing among these approaches. The convergence of formalisms on the class of languages defined by these grammars provides some reason to believe that we are getting close to the natural class for natural languages, but does not provide any reason for preferring one of the equivalent formalisms over any other. With regard to recognition complexity, to the level of detail un- derstood to date, again MG≡MGH≡MTTG, and all are tractable. We do not know how to establish any any relevant complex- ity differences. With regard to succinctness of smallest gram- mars for any L , to the level of detail understood to date, again MG≡MGH≡MTTG There is promising ongoing research on the learnability of k-valued MG, MGH, and MTTG from strings. It is reasonable to choose among equivalent formalisms those with simplest representations of child language and acquisition, but we have not yet discovered any reason for thinking that this favors one of the 3 kinds of grammar considered here. Many open questions remain.
Angluin, Dana. 1982. Inference of reversible languages. Jour- nal of the Association for Computing Machinery , 29:741–765.
Stabler – Comparing 3 Perspectives 191
Berwick, Robert C. 1981. Computational complexity of lexical functional grammar. In Proceedings of the 19th Annual Meet- ing of the Association for Computational Linguistics, ACL’81 , pages 7–12.
Brody, Michael. 1998. Projection and phrase structure. Linguis- tic Inquiry , 29:367–398.
Brody, Michael. 1999. Mirror theory: a brief sketch. In A Celebration. MIT Press, Cam- bridge, Massachusetts. Currently available at http://mitpress.mit.edu/chomskydisc/brody.html.
Brody, Michael. 2000. Mirror theory: syntactic representation in perfect syntax. Linguistic Inquiry , 31:29–56.
Chomsky, Noam. 1956. Three models for the description of language. IRE Transactions on Information Theory , IT-2:113–
Chomsky, Noam. 1995. The Minimalist Program. MIT Press, Cambridge, Massachusetts.
Clark, Robin. 1994. Kolmogorov complexity and the informa- tion content of parameters. Technical report, 94-17, Insti- tute for Research in Cognitive Science, University of Penn- sylvania.
Cornell, Thomas L. 1997. Representational minimalism. SFB 340 Technical Report #83, University of Tübingen. Revised version forthcoming in U. Mönnich and H.-P. Kolb, eds.
Cornell, Thomas L. 1998. Island effects in type logical ap- proaches to the minimalist program. In Proceedings of the Joint Conference on Formal Grammar, Head-Driven Phrase Structure Grammar, and Categorial Grammar, FHCG-98 , pages 279–288, Saarbrücken.
Gold, E. Mark. 1967. Language identification in the limit. Infor- mation and Control , 10:447–474.
Harkema, Henk. 2001a. A characterization of minimalist lan- guages. In Philippe de Groote, Glyn Morrill, and Christian Retoré, editors, Logical Aspects of Computational Linguistics ,
Stabler – Comparing 3 Perspectives 193
Michaelis, Jens. 1998. Derivational minimalism is mildly context-sensitive. In Proceedings, Logical Aspects of Com- putational Linguistics, LACL’98 , NY. Springer.
Michaelis, Jens. 2001a. On Formal Properties of Minimalist Grammars. Ph.D. thesis, Universität Potsdam. Linguistics in Potsdam 13 , Universitätsbibliothek, Potsdam, Germany.
Michaelis, Jens. 2001b. Transforming linear context free rewriting systems into minimalist grammars. In Philippe de Groote, Glyn Morrill, and Christian Retoré, editors, Logi- cal Aspects of Computational Linguistics , Lecture Notes in Ar- tificial Intelligence, No. 2099, pages 228–244, NY. Springer.
Michaelis, Jens and Marcus Kracht. 1997. Semilinearity as a syn- tactic invariant. In Christian Retoré, editor, Logical Aspects of Computational Linguistics , pages 37–40, NY. Springer- Verlag (Lecture Notes in Computer Science 1328).
Peters, P. Stanley and R. W. Ritchie. 1973. On the generative power of transformational grammar. Information Sciences , 6:49–83.
Putnam, Hilary. 1961. Some issues in the theory of grammar. In Proceedings of Symposia in Applied Mathematics. 12: 25- 42. Reprinted in Mind, Language and Reality: Philosophical Papers, Volume 2 , by Hilary Putnam. New York: Cambridge University Press.
Rambow, Owen. 1994. Formal and computational aspects of natural language syntax. Ph.D. thesis, University of Pennsyl- vania. Computer and Information Science Technical report MS-CIS-94-52 (LINC LAB 278).
Rissanen, Jorma and Eric Ristad. 1994. Language acquistion in the MDL framework. In Eric Ristad, editor, Language Com- putations. American Mathematical Society, Philadelphia.
Ristad, Eric. 1993. The Language Complexity Game. MIT Press, Cambridge, Massachusetts.
Seki, Hiroyuki, Takashi Matsumura, Mamoru Fujii, and Tadao Kasami. 1991. On multiple context-free grammars. Theo- retical Computer Science , 88:191–229.
194 UCLA/Potsdam Working Papers in Linguistics
Stabler, Edward P. 1984. Berwick and Weinberg on linguistics and computational psychology. Cognition , 17:155–179.
Stabler, Edward P. 1997. Derivational minimalism. In Christian Retoré, editor, Logical Aspects of Computational Linguistics. Springer-Verlag (Lecture Notes in Computer Science 1328), NY, pages 68–95.
Stabler, Edward P. 1998. Remnant movement and com- plexity. In Joint Conference on Formal Grammar, Head- Driven Phrase Structure Grammar, and Categorial Gram- mar, FHCG-98 , Saarbrücken, Germany. Universität des Saar- landes.
Stabler, Edward P. 1999. Remnant movement and complex- ity. In Gosse Bouma, Erhard Hinrichs, Geert-Jan Kruijff, and Dick Oehrle, editors, Constraints and Resources in Natural Language Syntax and Semantics. CSLI, Stanford, California, pages 299–326.
Stabler, Edward P. 2001. Recognizing head movement. In Philippe de Groote, Glyn Morrill, and Christian Retoré, ed- itors, Logical Aspects of Computational Linguistics , Lecture Notes in Artificial Intelligence, No. 2099. Springer, NY, pages 254–260.
Stabler, Edward P. 2002. Identifying mini- malist languages from dependency structures. http://www.clsp.jhu.edu/seminars/abstracts/S2002/stabler.shtml.
Steedman, Mark J. 2000. The Syntactic Process. MIT Press, Cambridge, Massachusetts.
Torenvliet, Leen and Marten Trautwein. 1995. A note on the complexity of restricted attribute-value grammars. In Pro- ceedings of Computational Linguistics In the Netherlands, CLIN5 , pages 145–164.
Vijay-Shanker, K. and David Weir. 1994. The equivalence of four extensions of context free grammar formalisms. Math- ematical Systems Theory , 27:511–545.
Weir, David. 1988. Characterizing mildly context-sensitive grammar formalisms. Ph.D. thesis, University of Pennsyl- vania, Philadelphia.
196 UCLA/Potsdam Working Papers in Linguistics
Obviously there are many ways to extend the tiny grammars shown above. For estimates of succinctness, etc, it is useful to consider extensions like these. (These grammars are not pre- sented as correct ones(!), but only as further examples to illus- trate the different mechanisms of the respective frameworks) Consider the slightly more elaborate MG1’:
-s::=v +v +case T ǫ ::=V +case =D v the::=N D -case king::N pie::N ǫ ::=v +aux v eat::=D V -v -ing::=v +v Prog -aux have::=PastPart v -v be::=Prog v -v -en::=v +v PastPart -aux
Then we can derive the king have -s be -en eat -ing the pie :
the
king
vP(6) v’ v have
PastPartP t(5)
-s
vP PastPartP(5) vP(4) v’ v be
ProgP t(3)
PastPart’ PastPart -en
vP ProgP(3) VP(2) V’ V eat
t(1)
Prog’ Prog -ing
vP DP t(7)
v’ DP(1) D’ D the
pie
v’ v VP t(2)
v’ v vP t(4)
v’ v vP t(6)
Stabler – Comparing 3 Perspectives 197
Consider the slightly more elaborate MGH1’:
-s::=>Have +case T have::=ven Have -en::=>Be =D ven eat::=D +case V the::=N D -case pie::N be::=ving Be -ing::=>V ving king::N
Then we can derive:
TP DP(1) D’ D the
king
Have have
-s
HaveP Have’ Have t
venP DP t(1)
ven’ ven Be be
ven -en
BeP Be’ Be t
vingP ving’ ving V eat
ving -ing
the
pie
t
t(0)
Consider the slightly more elaborate MTTG1’:
the::AgrN= D ǫ :::=D AgrD -case king::N ǫ :::=N AgrN eat::AgrD= V ǫ :::=V +case AgrO ǫ :::=AgrO AgrD= v be::ving= Be -ing:::=v ving have::ven= Have -en:::=Be ven -s:::=Have +case T pie::N
Then we can derive: