




























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Paper; Class: LITERARY THEORY; Subject: German; University: University of California - Los Angeles; Term: Unknown 2006;
Typology: Papers
1 / 36
This page cannot be seen from the preview
Don't miss anything!





























It is well known that members of morphological paradigms exert an influence over one another, and forms are occasionally rebuilt to create more coherent and consistent paradigms. For example, in early New High German, the singular forms of verbs with eu ∼ ie alternations (strong class II) were rebuilt to contain ie throughout, as in (1) (Paul, Wiehl, and Grosse 1989, §242):^1
(1) Loss of /iu/∼/ie/ alternations^2 in early New High German ‘to fly’ Middle High German Early New High German New High German 1sg vl iu ge > fl eu ge fliege 2sg vl iu gest > fl eu gst fliegst 3sg vl iu get > fl eu gt fliegt 1pl vliegen > fliegen > fliegen 2pl vlieget > fliegt > fliegt 3pl vliegen > fliegen > fliegen
In other verbs, singular∼plural vowel alternations were not lost, but were simply rearranged. For example, in strong verbs like h¨elfen ‘help’, n¨emen ‘take’, and geben ‘give’ (classes IIIb, IV and V, respectively) the 1sg form was rebuilt to match the plural, as in (2a) ( ibid. , §242). The result was a new alternation within the present tense paradigm, parallel to a separate pattern of alternation known as umlaut , seen in the verb graben ‘dig’ (2b):
(^1) I use the following orthographic conventions: X∼Y represents synchronic alternations between X and Y within a paradigm; X→Y represents a synchronic morphological or phonological rule changing input X to surface Y; X>Y indicates regular sound change from X to its expected outcome Y, while X Y indicates that form X has been replaced by an analogically rebuilt form Y. Analogically rebuilt forms are also underlined in tables, to highlight which parts of the paradigm underwent changes. In all of the cases discussed here, the term ‘paradigm’ refers to the set of inflected forms which share a single lexical stem (the set of case forms of a noun, the various person, tense and number inflections of a particular verb, etc.). (^2) This alternation was produced by regular sound changes affecting the Proto-Germanic diphthong * eu. Specifically, when eu preceded a syllable containing a high vowel, it raised to iu , and otherwise it lowered to eo and subsequently dissimilated to io > i @. Since the present singular suffixes all had high vowels ( -u , -is , -it and the plural suffixes all had non-high vowels (- emˆ , - et , - ant ), this resulted in singular ∼ plural alternations (Paul, Wiehl, and Grosse 1989, §35).
(2) Rearrangement of /i/∼/¨e/ alternations in early New High German a. ‘to give’ MHG Early NHG NHG b. Following pattern of ‘to dig’ 1sg g i be g i be gebe grabe 2sg g i best g i bst gibst gr¨abst 3sg g i bet g i bt gibt gr¨abt 1pl g¨eben geben geben graben 2pl g¨ebet gebt gebt grabt 3pl g¨eben geben geben graben
Although the changes in (1) and (2) yielded different patterns of alternation, what they have in common is that some members of the paradigm have been rebuilt to match other forms ( paradigm leveling ) or to differ systematically from another form ( analogical extension , or polarization ; Kiparsky 1968). The form that determines the shape of the rebuilt paradigm is traditionally referred to as the base , or pivot of the change. A long-standing issue in the study of analogy is the question of which forms act as bases, and which are rebuilt. Typically, this is cast as a typological question: are there certain forms that tend to serve as bases, and other forms that tend to be rebuilt? Careful inspection of many cases has revealed numerous tendencies: analogy tends to be based on frequent forms, shorter forms, “unmarked” forms, and so on (Kuryłowicz 1947; Ma ´nczak 1958; Bybee 1985; Hock 1991). These tendencies are often taken to be primitives of historical change: change tends to eliminate alternations by replacing less frequent alternants with more frequent ones, marked forms with unmarked ones, and so on. Much less attention has been devoted to explaining the language-particular aspects of analogy. Why does analogical change favor a particular base in a particular language? Why are alternations sometimes leveled, and sometimes extended? The typological approach makes only weak predictions about the particulars of individual cases: certain changes are universally more or less likely, and the fact that a particular language underwent a particular change is in some sense a statistical accident. To the extent that individual changes obey the typological tendencies, they can be seen as reasonable and “natural”, but analyses of analogical change seldom commit to the claim that an attested change was the only analogy that could possibly have occurred. In Albright (2002), a model is proposed that makes precisely this claim. Specifically, it is hypothesized that learners select base forms as part of a strategy to develop grammars that can produce inflected forms as reliably or as confidently as possible. In order to do this, learners compare different members of the paradigm, using each to attempt to predict the remainder of the paradigm with a grammar of stochastic rules. The part of the paradigm that contains as much information as possible about how to inflect the remaining forms is then selected as the base form, and a grammar is constructed to derive the rest of the paradigm. In this model, analogical change occurs when the resulting grammar derives the incorrect output for certain derived (non-basic) forms, and these errors come to replace the older, exceptional forms. Thus, all analogical change is viewed as (over)regularization, echoing earlier proposals by Kiparsky (1978) and others. Since the procedures for base selection and grammar induction are both deterministic, this model makes strong predictions about possible analogical changes: they must be based on the most informative form in the paradigm, and the only possible “analogical errors” are those that can be produced by the grammar. I demonstrate that these are correct in several typologically unusual cases.
or acquisition bias that gradually eliminates alternations and restores regularity. Since these pressures do not place any restrictions on how regularity is achieved, analogical change may proceed in many different directions, and it is often difficult to classify one change as more or less natural than another competing possibility. Continuing with the example from above, in Middle High German, many verbs exhibited vowel alternations between the plural and some or all of the singular, as in (3a). In Modern German, these alternations have been retained in some cases (e.g., ‘know’, (3b.i) and most of the modal verbs), lost in others (e.g., ‘fly’, (3b.ii)); in yet other verbs such as ‘to give’, (3b.iii), the alternation was retained in just some forms (the 2,3sg), as shown in (2) above.
(3) Paradigmatic changes in early New High German a. Alternations in Middle High German present tense paradigms i. ‘know’ ii. ‘fly’ iii. ‘give’ 1sg w ei Z vl iu ge g i be 2sg w ei st vl iu gest g i best 3sg w ei Z vl iu get g i bet 1pl w i ZZ en vl ie gen g ¨e ben 2pl w i ZZ et vl ie get g ¨e bet 3pl w i ZZ en vl ie gen g ¨e ben
b. Modern German paradigms (analogically changed forms are underlined) i. ‘know’ ii. ‘fly’ iii. ‘give’ 1sg w ei ß fl ie ge g e be 2sg w ei ßt fl ie gst g i bst 3sg w ei ß fl ie gt g i bt 1pl w i ssen fl ie gen g e ben 2pl w i sst fl ie gt g e bt 3pl w i ssen fl ie gen g e ben
The change from (3a) to (3b) represents a modest simplification or regularization: singular∼plural alternations have mostly been eliminated except in a few high frequency verbs (such as wissen ), leaving just two general patterns (non-alternation, and raising in the 2,3sg). Logically, there are many other possibilities that seem just as natural, however. Could analogical change have gone further, eliminating alternations in all verbs? Or could it have gone in a different direction, yielding paradigms like fl eu ge , fl eu gst , fl eu gt , fl eu gen , fl eu gt , fl eu gen , or perhaps fl ie ge , fl eu gst , fl eu gt , fl ie gen , fl ie gt , fl ie gen? Under the traditional view of analogy, the answer is affirmative: changes in any direction are possible. Nevertheless, it is commonly accepted that some changes are more likely than others. Analogical changes are often based on the shortest, or “least suffixed” member of the paradigm (Ma ´nczak 1958; Hayes 1995, Bybee 1985, pp. 50-52), the “least marked” member of the paradigm (Jakobson 1939; Greenberg 1966; Bybee and Brewer, 1980; Tiersma 1982; Bybee 1985), and the member of the paradigm with highest token frequency (Ma ´nczak 1980, pp. 284-285). In many cases, all three of these factors converge, yielding a base form that is frequent, unmarked, and unsuffixed (such as a nominative singular, or a third person singular present form). At the same time, there are many cases in which these factors do not converge, and a subsequent change obeys one trend at the expense of others. Even more troubling, there are analogical changes that
apparently violate all of these tendencies, rebuilding paradigms on the basis of a less frequent, more marked, suffixed base form (Hock 1991, chap. 10). A well known example of analogy based on a marked form involves the loss of final devoicing in Yiddish (Sapir 1915, p. 237; Kiparsky 1968, p. 177; Sadock 1973; Vennemann 1979, pp. 188- 189; King 1980). In its earliest stages, Yiddish, like Middle High German, had final devoicing of obstruents (seen here in alternation betwee sg. vek and pl veg @ in (4a)). However, in many dialects of Yiddish, final devoicing was subsequently “lost”, and the voicing value of the plural was reintroduced to the singular, leaving paradigms with [g] throughout as in (4b). (The change of the plural suffix from ∅ to -@ n is irrelevant for the point at hand.)
(4) Loss of final devoicing in Yiddish ‘way’ a. MHG b. Earlier Yiddish c. Modern Yiddish sg. pl. sg. pl. sg. pl. Nom., Acc. vek veg @ > vek veg (@) veg veg @ n Gen. veg @ s veg @ > veg @ s veg (@) Dat. veg @ veg @ n > veg (@) veg @ n
We can confirm that the change from vek to veg is due to the voicing in the plural form, and not to a separate process of final voicing , by comparing voiceless-final stems and observing that they remain voiceless (5).
(5) Voiceless-final stems remain voiceless ‘sack’ a. MHG b. Earlier Yiddish c. Modern Yiddish sg. pl. sg. pl. sg. pl. Nom., Acc. zak zek @ > zak zek (@) zak zek Gen. zak @ s zek @ > zak @ s zek (@) Dat. zak @ zek @ n > zak zek @ n
As Sapir first noted, this change is a paradigmatic one: words with [g] in the plural generally had [g] restored in the singular as well, while words with no plural form (such as the adverb vek ‘away’) did not change.^4 Thus, it appears that in this case, paradigms have been leveled to the form found in the plural, in spite of the fact that plurals are more marked and less frequent than singular forms. Cases of leveling to “marked” forms are not uncommon in the literature, and the usual response has been to claim that the direction of analogy reflects general typological tendencies, but is not governed by any hard and fast rules (Kuryłowicz 1947; Ma ´nczak 1958; Hock 1991). This position is summed up succinctly by Bybee and Brewer (1980, p. 215):
A hypothesis formulated in such a way makes predictions of statistical tendencies in diachronic change, language acquisition and psycholinguistic experimentation. It cannot, nor is it intended to, generate a unique grammar for a body of linguistic data.
Although this is a reasonable approach to finding and testing descriptive hypotheses, it is unsatisfying from an explanatory point of view. As an account of language change, it tells us what changes are likely in general, but it cannot tell us why a particular language changed in a particular way at a particular time. As an account of language acquisition or experimental (^4) The details of the loss of final devoicing are considerably more complex than what is described here; see King (1980) and Albright (in prep.) for an overview.
189). Vennemann calls this the predictability principle , and he observes that the desire to maintain contrasts can override the tendency to level to unmarked base forms. In Albright (2002), I proposed that the urge to maintain contrasts is more than a mere tendency that influences the direction of analogy; in fact, it forms the basis of how learners approach the problem of learning paradigms. The premise of this approach is that speakers (ideally) need to be able to correctly understand and produce inflected forms of their language. In order to do this, they cannot wait around to hear and memorize all forms of all words, since there are many forms that they will simply never encounter (particularly in a highly inflected language). Thus, learners need to make inferences about the phonological and morphological properties of words based on incomplete information. The proposal, then, is that learners adopt a strategy of focusing on the part of the paradigm that contains the most contrastive information, and allows them to project the remaining forms as accurately or as confidently as possibly—that is, the most informative, or predictive part of the paradigm. This form is chosen as the base of the paradigm, and a grammar is constructed to derive the remaining forms.^5 I will refer to this strategy as confidence maximization , since its goal is to allow the learner to infer properties of words as confidently as possible. On the face of it, a confidence-based approach appears to suffer from as many counterexam- ples as any of the tendencies discussed above. For example, a famous analogy in history of Latin eliminated a stem-final contrast between [r] and [s]: hon¯o s ∼ hon¯o r is ‘honor-NOM./GEN.’ hono r ∼ hon¯o r is , on analogy with underlying /r/ in words like soro r ∼ sor¯o r is ‘sister-NOM./GEN.’ (for discussion, see Hock 1991, pp. 179-190; Barr 1994, pp. 509-544 ; Kiparsky 1997; Kenstowicz 1998; Albright 2005, and many others). Such cases are not necessarily a problem if predictability is viewed as just one more factor that can compete or conspire to determine the direction of analogy, but they are a challenge to the idea that bases are always the most predictive form. Why does analogy sometimes wipe out distinctions, if bases are always chosen to be maximally informative? I hypothesize that the reason why analogy sometimes eliminates contrasts is that learners are restricted in the way that bases are chosen, and cannot always select a form that maintains all of the contrasts that are displayed in their language. In particular, I propose that there is a single surface base restriction: learners must choose a surface form as the base, and that the choice of base is global (that is, the same for all lexical items). When there is no single form in the paradigm that preserves all distinctions for all lexical items, the learner must choose the form that maintains distinctions for as many lexical items as possible.^6 In the case of Latin, the contrast between [r] ( soro r ) vs. [s] ( hon¯o s ) was neutralized to [r] in oblique forms ( soro r is , hon¯o r is ). This neutralization affected relatively few forms, however, compared to neutralizations in the nominative caused by cluster simplification and morphological syncretism; thus, the globally best choice of base form in Latin would have been an oblique form, even if it neutralized the rhotacism contrast. Thus, we see that the single surface base restriction can result in certain contrasts being lost in the base form; when this happens, they will be open to analogical leveling. For example, in Latin, the minority [r] ∼ [s] alternation with the more regular [r] ∼ (^5) This search for contrastive information is similar to the way in which generative phonologists usually assume that underlying forms are discovered; for discussion of the parallels and differences, see Albright (2002a). (^6) All of the cases discussed here involve a single base form within a rather limited “local” paradigm (one tense of a verb, singular and plural forms of a noun, etc.) An important question not addressed here is whether larger paradigms, with multiple tenses, aspects, etc., might involve multiple “local” bases—perhaps along the lines of the traditional principle parts analysis of Latin or Greek verbs. The question of what considerations might compel learners to establish multiple base forms is a matter of on-going research; some examples are discussed in Albright (2002a), §6.
[r] pattern. Leveling, under this approach, is not a grammatical simplification, but rather lexical simplification, eliminating exceptions and replacing them with grammatically preferable regular forms. It should be emphasized that nothing in this system requires that overregularization/leveling must take place; as long as learners have sufficient access to input data, there is always the potential to learn and maintain irregularity in derived (=non-basic) forms. The model does not make specific predictions about when leveling will occur, except that we would obviously expect it when input data about exceptional forms is reduced, such as in low frequency words, reduced input because of bilingualism or language death, etc.; see also (Kuryłowicz 1947), (Bybee 1985), (Barr 1994), and Garrett (this volume) on this point. In fact, I am largely in agreement with Garrett’s claim that the morphological change is driven diachronically by inaccurate or incomplete transmission of the full set of inherited forms. The current model differs from his account, however, in positing a cognitive constraint on the form of possible grammars: namely, that all morphological rules refer to the same base form as their input. This restriction is what allows the model to make strong predictions about directionality: when change occurs, it should always involve replacing an exceptional non-basic form with an innovative regularized form. Although this restriction is not a logically necessary part of the formalism, it receives empirical justification from the fact that analogy is overwhelmingly unidirectional, to a far greater extent than token frequency, memory failings, and chance would predict. In Classical Latin, nominatives were rebuilt on the basis of an oblique form, while in English, preterite forms were always rebuilt on the basis of presents (Garrett, this volume, p. xxx) and in Ancient Greek, presents were rebuilt on aorists (ibid., p. xxx), and so on.^7 I take such asymmetries to be the fundamental explicandum of analogical change. Unlike the tendency-based approach, the confidence-based explanation of analogy aims to capture the directionality of all cases of paradigmatic change—a steep task, given the typological diversity of attested changes. In Albright (2002a), I examined several typologically unusual analogies, showing that they were indeed based on the most predictive, or informative member of the paradigm. In order to demonstrate the validity of this approach in general, however, two questions must be answered. The first is a question of coverage: are all analogical changes really based on the most predictive member of the paradigm, or are there cases in which analogy favors a less predictive member of the paradigm, contrary to the predictions of confidence maximization? The first goal of this paper is to examine one class of apparent counterexamples, using data from an analogical change currently underway in Korean. I will show that even though a form may appear to be radically uninformative when viewed schematically, it may actually be the most predictive form when we consider the language as a whole. Thus, some apparently exceptional changes need not be seen as a counterexamples at all, once we have a suitable understanding of the factors that play a role in the calculation of confidence. The second question that must be answered is a typological one: can a confidence-based approach explain why certain types of analogy are extremely common, while others are relatively rare? In general, such questions play a secondary role in structural analyses; as long as the predicted patterns are a good match to the attested patterns, the question of why some are chosen more frequently is often ignored (though see, e.g., Harris (this volume) for one possible line of explanation). Nonetheless, the fact remains that analogy is very often based on unmarked or frequent members of the paradigm, and this fact demands some sort of explanation. The second goal of this paper, then, is to explore the behavior of the model when exposed to a
(^7) See Garrett’s treatment of apparent counterexamples in Greek (p. xxx), and also Tiersma (1982), Bybee (1985), for discussion of factors that might reverse the ordinary directionality of a change.
Learning to produce inflected forms of words would be a relatively easy task if all lexical items took the same sets of endings, there were no exceptional irregular forms, and phonology never acted to neutralize surface contrasts. In such a language, the learner would simply need to compare related forms of a few words in order to ascertain the suffixes. For example, faced with the paradigms in (7) (based on a simplified version of Middle High German), the learner could infer that the nominative singular suffix is null, the genitive singular suffix is -es , and the nominative plural suffix is -e.^10
(7) Paradigms with no alternations Nom. sg. Gen. sg. Nom. pl. Gloss jar jares jare ‘year’ kil kiles kile ‘quill’ sin sines sine ‘sense, mind’ arm armes arme ‘arm’ SreI SreIes SreI ‘cry, shout’
Actual languages can, of course, be more complicated: phonology may act to neutralize surface contrasts in some parts of the paradigm, words may fall into different inflectional classes, and there may be irregular exceptions that fail to follow any of the major patterns. Consider the following sets of forms (again, based loosely on Middle High German), in which some forms show voicing alternations in the stem-final consonant (8a), while others do not (8b).
(8) Phonological neutralization a. Stems with voicing alternations Nom.sg. Gen.sg. Nom.pl. Gloss tot todes tode ‘death’ nit nides nide ‘enmity’ helt heldes helde ‘hero’ sant sandes sande ‘sand’ tak tages tage ‘day’ tswik tswiges tswige ‘branch’ diNk diNges diNge ‘thing’ vek veges vege ‘way’ bri@f bri@ves bri@ve ‘letter’ kreIs kreIzes kreIze ‘circle’ lop lobes lobe ‘praise’
b. Non-alternating stems Nom.sg. Gen.sg. Nom.pl. Gloss mut mutes mute ‘courage’ Srit Srites Srite ‘step’ knext knextes knexte ‘servant’ geIst geIstes geIste ‘spirit’ nak nakes nake ‘nape’ blik blikes blike ‘glance’ druk drukes druke ‘pressure’ lok lokes loke ‘lock (hair)’ Sif Sifes Sife ‘ship’ slos sloses slose ‘lock’ Simpf Simpfes Simpfe ‘taunt’
The data in (8) suggest that the language has a phonological process (such as final devoicing) which neutralizes the contrast between voiced and voiceless obstruents word-finally. In order to discover this, the learner must make two types of comparisons. First, by comparing nominative tot with genitive tod-es , it can be seen that some sort of process is operating to create voicing
(^10) I leave aside the possibility that all nouns end in -e and that the nominative singular shows truncation, with -s and ∅ as the gen. sg. and nom. pl. suffixes, respectively. Such parsing problems can be non-trivial, however. For example, given just the first set of forms ( jar , jares , jare ), the learner might be uncertain about whether the r is part of the stem or the suffix. The procedure described below operates under the assumption that if material is shared by all morphologically related forms, it is part of the stem—that is, jar - ∅, jar-es , jar-e.
alternations. Second, by comparing tot ∼ todes with mut ∼ mutes , one can infer that it is a devoicing process, and not a voicing one (or else we would expect genitive singular * mudes ). This is the basic logic behind many phonology problems: noticing that rules must operate in one direction rather than vice versa, because there is an unpredictable opposition which is neutralized in some forms but not in others. Furthermore, having discovered this, the learner also now knows that the nominative singular form is not a reliable source of information regarding the voicing of stem-final obstruents; for this, one must look to a suffixed form. In addition to phonological neutralizations, the learner must also contend with the possi- bility of inflectional classes, which are not always distinct in all forms. For example, alongside the forms in (7) and (8), the language may have words like those in (9), which differ by changing their vowel in the plural form, or by taking a different suffix, or both.
(9) Stems in different inflectional classes Nom.sg. Gen.sg. Nom.pl. Gloss sak sakes seke ‘sack’ korp korbes kœrbe ‘basket’ rok rokes rœke ‘coat’ lip libes liber ‘body’ vort vortes vort ‘word’ li@t li@des li@der ‘song’ lant landes lender ‘land’
The problem of discovering inflectional classes is usually treated as a separate problem from that of finding phonological contrasts, but the considerations are the same: an unpredictable difference seen in one part of the paradigm (here, the plural) may be neutralized in another part of the paradigm (i.e., the singular), forcing learners to look to a particular part of the paradigm for the crucial distinguishing information. The task, then, is to discover that a contrast seen in form A in the paradigm is neutralized in form B. One possible approach would be to establish correspondences between every segment in form A and form B, checking to make sure that the relations were always one-to-one bijections ( x always mapped to y ). If a many-to-one relation is discovered ( x maps to y in one word and to z in another), then we could infer that a contrast between y and z is neutralized to x in some environment. This is shown schematically in (10).
(10) Establishing correspondence between segments in related forms a. One-to-one relation a 1 b 2 x 3 ∼ a 1 b 2 y 3 -suffix (tak ∼ tag-es)
c 1 d 2 x 3 ∼ c 1 d 2 y 3 -suffix (tswig ∼ tswig-es)
b. One-to-many relation a 1 b 2 x 3 ∼ a 1 b 2 y 3 -suffix (tak ∼ tag-es)
c 1 d 2 x 3 ∼ c 1 d 2 z 3 -suffix (druk ∼ druk-es)
The input to the model is a set of paradigmatically related forms in phonetic transcription, such as the ones in (7)-(9) above. In order to permit generalizations about phonological environments, the model is also provided with a matrix of phonological feature values for the sounds of the language (that is, knowledge of phonological features is assumed to be “innate”). In addition, the model is provided with knowledge about sequences that are surface illegal in the language, in the form of a list of non-occurring sequences. In the case of a language with no word-final voiced obstruents, this list would include sequences like [b#], [d#], [g#], etc. As discussed above, a key observation in discovering neutralizations is the simple fact that neutralizations lead to ambiguity, and thus, potential uncertainty. For instance, given a nominative singular form [mut], the learner is not certain whether the plural should be [mute], [mude], or even [myte], [myde], or some other form. Thus, if one were to construct a grammar that used the nominative singular as its input and tried to generate nominative plurals by rule, there would be some indeterminacy concerning both voicing and also the correct suffix to use. In such a case, the grammar might pick one of these outcomes as the regular outcome (for example, simply adding -e with no voicing or vowel change), but this would leave unaccounted for many “irregular” forms that took other patterns. Going from the plural to the singular, on the other hand, there is no ambiguity concerning final obstruent voicing: when the suffix is removed and the obstruent is put into final position, it must be devoiced.^11 It is important to recognize that frequently, the seriousness of an ambiguity can be mitigated by means of clever and detailed rules that capture sub-generalizations about the patterns involved. In the sample data in (9), for example, we see that final obstruents are never voiced when they follow a fricative (pl. [knexte], [geIste], but no forms like *[bexde], *[meIsde]). At the same time, it happens that in this set of forms, [t] always voices after [n] ([sande], [lender] but no hypothetic *[bente], *[menter]). Stem-final [p] always voices ([lobe], but no *[rope]), while stem-final [pf] never does ([Simpfe], but no *[dimbve]). These small-scale generalizations (dubbed “islands of reliability” in Albright (2002b)) have the potential to recover a good deal of information about a contrast that has been neutralized. Furthermore, there is a growing body of experimental evidence showing that speakers are actually sensitive to such patterns (Zuraw 2000; Albright, Andrade, and Hayes 2001; Albright 2002b; Albright and Hayes 2003; Ernestus and Baayen 2003). Therefore, any attempt to estimate the seriousness of a neutralization must explore the possibility of “predicting one’s way out of it” by means of such small-scale generalizations. The Minimal Generalization model of Albright and Hayes (2002) is a model of grammar induction that is designed to do precisely this. It takes pairs of morphologically related forms and compares them, attempting to find the most reliable generalizations it can about the mapping from one form to the other. It starts by taking each data pair and comparing the input and output, to determine what has changed, and what is constant. The result is expressed as a word- specific rule, describing the mapping involved for just this one datum. For example, given the nominative singular and plural forms in (7)-(9), the Minimal Generalization algorithm would start by factoring each pair into a changing and non-changing portion, thereby determining that several changes seem to be involved. This is shown, for a subset of the data, in (11). At a first pass, the changing portion corresponds roughly to the affixes, and the constant portion can be considered the stem, though in cases where the voicing of the final obstruent is altered, we see
(^11) Note that whichever direction is chosen, there may still be ambiguities concerning root vowel alternations—for example, singular [a] could correspond to plural [a] or [e], while plural [e] could correspond to singular [e] or [a]— though even here, the plural→singular mapping is less ambiguous (singular [u] may correspond to plural [u] or [y], but plural [y] almost always corresponds to singular [u]).
that this is also included as part of the change in this initial parse.
(11) Factoring the input data into change and context Input Output Restated as a word-specific rule mut mute ∅ → e / mut # Srit Srite ∅ → e / Srit # knext knexte ∅ → e / knext # geIst geIste ∅ → e / geIst # jar jare ∅ → e / jar # SreI SreIe ∅ → e / SreI # tot tode t → de / to # nit nide t → de / ni # helt helde t → de / hel # tak tage k → ge / ta # vort vort ∅ → ∅ / vort #
The next step is to generalize, by comparing word-specific rules that involve the same morpho- logical change. For example, comparing the word-specific rules for mut ∼ mute and S rit ∼ S rite , the model posits a new rule added -e after any stem that ends in a [t] preceded by a high vowel:
(12) Generalization over pairs of related rules Change Residue Shared Shared Change Shared features segments location segments ∅ → e m u t # ∅ → e Sr i t # ∅ → e X
+syllabic +high
t #
The precise generalization scheme is as follows: moving outward from the change location, any strictly identical segments are retained in the generalized rule, in the “shared segments” term. Upon encountering a pair of mismatched segments, the model compares them to determine what feature values they have in common; these are retained as the “shared features.” Finally, if either of the rules under comparison has additional material left over, this is converted to a free variable (here, ‘X’). The search for shared material is carried out symmetrically on both the left and right sides. Here, the fact that the change is word-final is indicated by means of a shared word-edge symbol (‘#’), but it could also be indicated simply by the lack of a free variable on the right side (meaning no additional material can be matched on this side). The comparison in (12) happens to yield a rule that is scarcely more general than the word- specific rules that spawned it. When the process is iterated over the entire input set, however, much broader generalizations can emerge through comparison of heterogeneous input forms, including even context-free generalizations. One pathway to context-free -e suffixation is shown in Figure 1. The goal of generalization is not merely to discover which contexts a change applies in, but also to discover where it applies reliably. This is assessed by keeping track of a few simple statistics. For each rule, the model determines how many forms in the input data meet the structural description of the generalization (data it tries to take on = its SCOPE), along with how many of those forms actually take the change required by the generalization (data it actually works for = its HITS). For example, consider the rule affixing [-e] after the sequence of a high
0 10 20 30 40 50 Number of observations
Confidence
Reliability = 1. Reliability = 0. Reliability = 0. Reliability = 0.
Figure 2: Relationship between amount of data and confidence limit adjustment
reliability, the model is able to favor broader generalizations (i.e., ones with more observations in their scope), even if they involve a few exceptions. As we will see in section 4.1, this adjustment also plays a crucial role when different amounts of data are available from different parts of the paradigm. The process described thus far yields a rather uninsightful analysis of voicing alternations— namely, that words with voicing alternations constitute a separate inflectional class which take a different set of suffixes (sg. -t , pl. -de ). The learner arrives at this analysis because the initial parse of the morphology occurs prior to any learning of phonological alternations, so there is no way of knowing that the [t] ∼ [d] alternation could be explained on phonological grounds. During the course of assessing the reliability of generalized rules, however, there is an opportunity to improve on this analysis, in the following way: when the model discovers that a form meets the structural description of a rule but does not obey it, an error is generated, which can be inspected for phonotactic violations. If the incorrectly predicted form contains an illegal sequence, then there may be a phonological rule involved, and the model attempts to posit a rule that fixes the incorrect form, transforming it into the correct, observed one. To take an example, when evaluating the morphological rule ∅ → e / [+syll, +high] t # discussed above, the model observes that the rule correctly generates the forms [mute] and [Srite] (two hits), but it incorrectly predicts the forms *[nite] and *[li@te] (for [nide] and [li@der], respectively). There are two possible reasons why the rule generates the wrong outcome: either it doesn’t apply to these words, or it does apply, but an additional phonological rule is needed to yield the correct surface output. Put more concretely, although the output *[nite] is incorrect, if the language had a process of intervocalic voicing, this would explain why the observed output is actually [nide]. The viability of an intervocalic voicing rule is tested by consulting the list of illegal sequences to see whether intervocalic [t] is known not to occur. In this case, we find that the hypothetical rule is not viable, since intervocalic [t] is fine in this language (in fact, it occurs in forms like [Srite]). Thus, we correctly discover that if we take the nominative singular as our starting point, there is no more insightful analysis to be had; all we can say is that there is an irregular competing process that sometimes changes [t] to [d]. Let us now contrast this with an analysis using the nominative plural as an input. Here, the changes that we observe include removing a suffix (e.g., [e] → ∅), and removing a suffix with a concomitant voicing readjustment (e.g., [de] → [t], as in [tode] → [tot]). As above, iterative generalization discovers a range of possible contexts characterizing the two changes, and the model evaluates the reliability of all of these generalizations. Now the context-free rule for [e] → ∅ makes the correct prediction for forms like [mute] and [Srite], but for plural [tode], [nide], and [tage], it incorrectly predicts singular *[tod], *[nid], and *[tag]. This time, the incorrect predictions could be fixed by a rule of final devoicing. When the erroneous predictions are
compared against the list of illegal sequences, we find that final voiced obstruents are in fact illegal, and a final devoicing rule is viable. With a phonological devoicing rule in place, forms like [tot] and [nit] can be derived by the simpler [e] → ∅ rule, and the reliability of this rule improves. Thus, by taking the plural as a starting point, the learner is able to come up with a unified analysis of the voiced and voiceless-final stems in (8). We see from this example that when a form suffering from neutralizations is used as the input to the grammar, the resulting rules are less accurate and less reliable, since they have to make guesses about essentially unpredictable properties. This suggests a straightforward strategy for discovering which form in the paradigm exhibits the most contrasts: simply take each form in the paradigm and try learning grammars that derive the remaining forms from it. The slot in the paradigm that yields the most accurate, most reliable grammars is then chosen as the base of the paradigm, and the remaining forms are derived from it by means of morphological and phonological rules. Thus, the base form is chosen in order to maximize confidence in the remainder of the paradigm. As noted in section 2.2, the confidence maximization approach has the potential to explain why analogical change sometimes takes the typologically unusual step of rebuilding more frequent, less marked forms. In the current example, based on Middle High German, we see that final devoicing made the nominative singular a relatively unpredictive form, while the number of different plurals suffixes would have made encouraged selecting a plural form as a base. This prediction seems to be borne out for real Middle High German: both Modern German and Yiddish show leveling of vowel length from the plural form (Paul, Wiehl, and Grosse 1989, §23), while Yiddish and some Bavarian dialects show the additional leveling of final obstruent voicing, discussed in section 2 above.
In the example in the previous section, all phonological and morphological contrasts were aligned so they were most clearly visible in the same part of the paradigm (the plural). Often this is not the case, however. In fact, different parts of the paradigm frequently maintain different information, since phonological and morphological neutralizations can theoretically target any slot within the paradigm. For this reason, it is generally assumed that learners are able to compare multiple forms of inflected words to arrive at their lexical representation (Kenstowicz and Kisseberth 1977). In Albright (2002a), a more restrictive model of acquisition is proposed: when no single part of the paradigm maintains all contrasts, the learner is forced to choose the single form that is generally most predictive, even if this means losing information about certain contrasts. This constraint can be called the single surface base hypothesis , since it requires that all paradigms of all words be organized around the same base form. To see how the single surface base constraint works, let us consider some additional data from the history of German. At some point in Old High German or early Middle High German, the phoneme [h] (from older [X] or [x]) was lost intervocalically (Braune and Mitzka 1963, §152b; Paul, Wiehl, and Grosse 1989, §111, §142). This created paradigmatic alternations, still seen in Modern German hoch [ho:x], h¨oher [hœ:5], am h¨ochsten [hœ:c¸st@n] ‘high/higher/highest’. It also created alternations in noun paradigms, since [h] deleted in forms with vowel-initial suffixes, such as the plural:
words remained distinct in the plural ([flœe] vs. [kœxe]). And in fact, this prediction is borne out: words like [Sux], [rex] and [flox] lost their [x] by analogy (Paul, Wiehl, and Grosse 1989, §25c, p. 44; Molz 1906, p. 294), and are pronounced [Su:], [re:], and [flo:] in Modern German, while historically vowel-final and k -final words remained unchanged. This example shows how the current approach can make very specific predictions about particular instances of analogical change. It predicts not only which form in the paradigm will be affected (non-basic forms, which are open to rebuilding if they cannot be generated correctly by grammar), but also which direction the change will go in (regularization to the lexically dominant pattern). By using a synchronic model of paradigm acquisition to predict asymmetries in possible errors, we are able to achieve a more constrained and explanatory theory of the direction of analogy. This model has been shown to work in several other unusual cases of analogy, as well. In Albright (2002a), I showed that it made the right predictions for three typologically unusual paradigmatic changes. The first was a case of across-the-board leveling to the 1sg in Yiddish verbs, in violation of the tendency to level to the 3sg. In this case, the advantage of the 1sg appears to be due to the devoicing or even total loss of stem-final obstruents which occurs in the 2sg/3sg/2pl, together with the fact that the 1sg maintains a contrast in stem-final schwas which is sometimes difficult to recover from the 1pl/3pl/infinitive. Since the 1sg is the only form that maintains all of these contrasts, it is the most predictive, and is (correctly) chosen to serve as the base. The second unusual change involved the elimination of [s] ∼ [r] alternations in Latin (the famous honor analogy), in which the nominative singular form of noun paradigms was rebuilt on the basis of an oblique form. As with the MHG case discussed here, the preference for a suffixed form in Latin seems to be due to phonological processes that affected word-final obstruents. The details of the change, which affected only polysyllabic non-neuter nouns, are also correctly predicted by a model that uses probabilistic rules to capture lexical tendencies in different contexts. The final case involved an analogical change in Lakhota verbs which appears to have been based on the 2sg. In this case, the neutralizations involved were both more complex and more symmetrical, but the advantage of the 2sg seems to have come from the fact that it maintained the contrast between two large classes of words which were neutralized elsewhere in the paradigm. For details on all of these changes, the reader is referred to Albright (2002a). The upshot of this section is that the proposed model makes advances in explaining the “language particulars” of analogical change. What remains to be shown, however, is whether it has anything to say about universal tendencies.
4 Typological tendencies: exploring the parameter space of the model
The model laid out in the previous section makes a strong claim about base forms in paradigms. It posits that bases play an integral role in the synchronic organization of grammar, and that they are chosen in order to facilitate, or optimize, the resulting grammar. A base form is considered optimal in this system if it contains enough information to reliably predict the remaining forms in the paradigm, by preserving contrasts and lacking neutralizations. This procedure for selecting base forms seems rational as a theory of how synchronic grammars are organized, and also makes the correct predictions for individual cases which are typologically unusual. In this section, I show that this procedure also makes the correct typological predictions.
There are two distinct issues that must be addressed in assessing the typological predictions of the model. The first is the issue of empirical coverage: are there attested analogies which run counter to the hypothesis that base forms are always the most informative form? The second issue is one of relative frequency: in principle, contrasts could be maintained anywhere in the paradigm (1sg, 2sg, 3sg, etc.), and in many cases, contrasts are maintained equally well by multiple forms. Why is there a strong tendency for analogy to be based on the most frequent or least marked forms? I will consider each of these questions in turn.
A common criticism of proposed principles of analogy is that there always seem to be excep- tions: even if analogy usually extends the most frequent, the least marked, or the unsuffixed form, occasionally it goes in the opposite direction, extending a less frequent, more marked, suffixed form. Under a tendency-based approach, such exceptions are not a necessarily problem, since the goal is to explain only what is likely , not what is possible. The current model makes a stronger claim, however, that the direction of analogy should be predictable in all cases. The question immediately arises, therefore, whether there are exceptions to the informativeness-based account of analogy, just as there are exceptions to every other proposed tendency. There are in fact a number of well-known examples of analogies based on pivot forms that appear to involve massive neutralizations. One case that is often cited comes from Maori (Hohepa 1967; Hale 1973; Kiparsky 1978; Hock 1991, pp. 200-202; Barr 1994, pp. 468-477; Kibre 1998). In Maori, passives were historically formed by adding a vowel-initial suffix (generally - ia or -a ) to the verb stem: awhit → awhit-ia ‘embrace’, hopuk → hopukia ‘catch’, and so on. Subsequently, word-final stops were deleted ( awhit > awhi ) creating alternations within verb paradigms:
(16) Unpredictable consonants in Maori passive Verb Passive Gloss awhi awhi t ia ‘embrace’ hopu hopu k ia ‘catch’ aru aru m ia ‘follow’ waha wahaNia ‘carry on back’ mau mau r ia ‘carry’ wero wero h ia ‘stab’ hoka hokaia ‘run out’ patu patua ‘strike, kill’
What makes the Maori case notable is the fact that the passive suffix has apparently been reanalyzed as a set of competing consonant-initial suffixes ( -tia , -kia , -mia , - N ia , -ria , -hia , etc.), and the -tia and -a suffixes have gradually been replacing the remaining allomorphs: for example, newer wahatia , wahaa alongside older waha N ia. In other words, passive forms are being analogically rebuilt on the basis of the unsuffixed stem, even though it lacks information about unpredictable final consonants. A similar change is underway in present-day Korean, in which noun paradigms are being rebuilt on the basis of unsuffixed (isolation) forms, even though these forms suffer from drastic