Syllable Structure Tree, Exercises of Structures and Materials

Distributional Properties,Phonotactical Rules,Syllable Template,Consonant Clusters and Lexical Constraints.

Typology: Exercises

2021/2022

Uploaded on 02/11/2022

ameen
ameen 🇺🇸

4.6

(5)

236 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Massachusetts Institute of Technology
Department of Electrical Engineering & Computer Science
6.345 Automatic Speech Recognition
Spring, 2003
Issued: 02/14/03
Due: 02/26/03
Assignment 2
Syllable Structure
A language is not only limited by the inventory of basic sound units, but also by the
allowable combinations of these sounds. This assignmentisintended to giveyou some feeling
about such constraints.
To do this, we will use an interactive software facility named
Crystal
, which runs on
the Linux workstations.
Crystal
is an interactive system which provides many functions for
studying and displaying the distributional constraints of a lexicon. For the purpose of this
lab, we will use the Merriam Pocket dictionary, which contains about 20,000 entries, as the
working lexicon. To start the lab simply enter the command:
% start_lab2.cmd
Distributional Prop erties
We will begin our investigation by examining some of the distributional properties of this
lexicon of English words.
T1:
In this exercise, we will study the properties of the most common words in the English
language. Click on
Sort by Brown Corpus Frequency (BCF)
1
in the
Search Results
sub-
window, which will sort the words in the dictionary according to their number of
occurrences in the Brown Corpus. Study the counts and properties of the top 15 words
in the list.
Q1:
What are the common characteristics of the 15 most frequentwords (e.g., number
of syllables, part of speech, etc.?)
1
The Brown Corpus is a corpus of over one million words gathered at Brown University. These words
were taken from various sources such as books, papers and magazines, and their frequencies of o ccurrence
were recorded.
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Syllable Structure Tree and more Exercises Structures and Materials in PDF only on Docsity!

Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science

6.345 Automatic Sp eech Recognition Spring, 2003

Issued: 02/14/ Due: 02/26/

Assignment 2

Syllable Structure

A language is not only limited by the inventory of basic sound units, but also by the allowable combinations of these sounds. This assignment is intended to give you some feeling ab out such constraints.

To do this, we will use an interactive software facility named Crystal, which runs on the Linux workstations. Crystal is an interactive system which provides many functions for studying and displaying the distributional constraints of a lexicon. For the purp ose of this lab, we will use the Merriam Po cket dictionary, which contains ab out 20,000 entries, as the working lexicon. To start the lab simply enter the command:

% start_lab2.cmd

Distributional Prop erties

We will b egin our investigation by examining some of the distributional prop erties of this lexicon of English words.

T1: In this exercise, we will study the prop erties of the most common words in the English language. Click on Sort by Brown Corpus Frequency (BCF)^1 in the Search Results sub- window, which will sort the words in the dictionary according to their numb er of o ccurrences in the Brown Corpus. Study the counts and prop erties of the top 15 words in the list.

Q1: What are the common characteristics of the 15 most frequent words (e.g., numb er of syllables, part of sp eech, etc.?) (^1) The Brown Corpus is a corpus of over one million words gathered at Brown University. These words

were taken from various sources such as b o oks, pap ers and magazines, and their frequencies of o ccurrence were recorded.

T2: In this exercise, we will study the prop erties of the most frequent two and three syllable words in the English language. Set the Search Typ e to stress and typ e in.. (.? ) in Search String. Note that all characters in the search string are separated by spaces. The rst two dots match two syllables, while the third dot a question mark in parentheses matches an optional third syllable.

Q2: What are the most frequent two and three syllable words, and how highly are they ranked in the lexicon? When lo oking at only two syllable words by using.. as the search string, which syllable is more likely to b e stressed? For the second part, use S to match a stressed syllable.

T3: In this exercise, we will study the distribution prop erties of syllable patterns for English. Restore the original lexicon by clicking on it in the history sub-window. Click on Syllables p er Word in the Statistics sub-window. The distribution of the syllable patterns in the Brown Corpus is di erent from that in the dictionary, b ecause some words in the dictionary o ccur more often than others. To weight words by their Brown Corpus frequencies click on Weight by BCF in the Statistics sub-window. The Syllables p er Word graph should now b e weighted by Brown Corpus frequencies.

Q3: It turns out that all of the words in the lexicon contain eight or fewer syllables. What is the most frequent numb er of syllables p er word? Describ e the probability distribution for Numb er of Syllables p er Word. How would your answer di er when the words are weighted by their Brown Corpus frequencies?

T4: In this exercise, we will study the distribution of stress patterns for English. Click on Stress Pattern Occurrences in the Statistics sub-window. Also, view the distribution as weighted by Brown Corpus frequencies.

Q4: What is the most frequent p olysyllabic stress pattern? How would your answer di er when the words are weighted by their Brown Corpus frequencies?

T5: In this exercise, we will study the distribution prop erties of phonemes for English. Click on Phoneme Occurrences in the Statistics sub-window. Also, view the distribution as weighted by Brown Corpus frequencies.

Q5: Of the ten most frequently o ccurring phonemes in the lexicon, what are the most common manner of pro duction and place of articulation? How would your answer di er when the words are weighted by their Brown Corpus frequencies?

Phonotactical Rules

The study of allowable sound sequences of a language are called phonotactics. This part of the assignment exp oses you to some of the common phonotactical rules of English.

Consonant Clusters

There are only a limited numb er of distinct word-initial and word- nal consonant clusters in the English language. We will study their prop erties in this part of the lab.

T6: First, restore Search Typ e to phonemic. Search for word-initial consonant clusters in the original lexicon containing at least two consonants by typing C C ( C * ) V. * in Search String. The C C ( C * ) p ortion matches two or more consonants, while the V p ortion matches exactly one vowel. Finally, the. * p ortion matches the remaining zero or more phonemes of an arbitrary word. Pay sp ecial attention to the existence of

/tk/ and /kt/ clusters.

Next, restore the original lexicon by clicking on it in the history sub-window. Search for all p ossible word- nal consonant clusters in the lexicon by typing. * V C * in

Search String. Pay sp ecial attention to the existence of /tk/ and /kt/ clusters.

Q6: We know that no word in the dictionary contains consonant cluster /tkt/ or /ktk/

(you can verify this by searching the lexicon with. * t k t. * or. * k t k. *.) Are the following two phonemic transcriptions p ossible?

(a) /   t k t   / (b) /   k t k   /

What is the maximum length of a word-initial consonant cluster? At this length, how many consonant clusters are there and what are they?

Vowel Clusters

T7: Search for words with two adjacent vowels by typing. * V V. * in Search String. Be sure to restore the original lexicon and ignore syllable b oundaries by enabling Ignore Syllable Boundaries.

Q7: How many words have two vowels in a row? How many of them have a schwa as the second vowel? How many have a schwa as the rst vowel? Use ( ax j ix ) to match b oth plain or front schwas. What do two adjacent vowels imply ab out the syllable structure of the two syllables to which they b elong?

Homorganic Rules

T8: The homorganic nasal-stop rule states that nasal-stop clusters must agree on the place of articulation. Verify this by examining all the o ccurrences of nasal-stop clusters in the lexicon. You can search for all words containing nasal-stop sequences by typing

. * NASAL STOP. * in Search String. You can also search for more sp eci c exam-

ples in the resulting sub-lexicon. For example, to search for words containing /nd/,

typ e. * n d. * in Search String; to search for words containing either /nd/ or /nt/,

typ e. * n ( d j t ). * in Search String. You will want to exp eriment how ignoring or accounting for syllable b oundaries a ects the results.

Q8: How often is nasal-stop homorganic rule violated? Can you try to generalize a rule to summarize when it is broken.

Lexical Constraints

In this part of the lab you will investigate the extent to which a given word can b e disam- biguated from comp etitors based on partial phonetic information.

T9: You have done some sp ectrogram reading practice in class. In this exercise, we will show that the use lexical access can greatly assist the task. In the Figures 3, 4, and 5, you will nd three sp ectrograms of isolated words. Start with a very coarse transcription of the sp ectrogram by hand. If you can not determine the phones, try to come up with phone classes such as vowel, nasal, strong fricatives, voiced stop, etc. Perform a search on the lexicon based on your partial hyp othesis. If you can not determine the words, try to re ne your hyp othesis and search again. The search pattern should b e expressed as regular expressions, many examples of which have are already b een given in the previous tasks. The following classes have b een de ned along with abbreviations, or you can use the OR op erator, j, to create custom classes. Enable Ignore Syllable Boundaries so that you will not have to explicitly sp ecify syllable b oundaries.

CLASS ABBREVIATION MEMBERS VOWEL V all vowels RETROFLEXED R r axr er FRICATIVE F s sh z zh f th v dh STRONG-FRICATIVE SF s sh z zh WEAK-FRICATIVE WF f th v dh NASAL N m n ng GLIDE G w y LIQUID L l r SEMIVOWEL SV l r w y ASPIRANT hh STOP S b d g p t k VOICED-STOP VS b d g UNVOICED-STOP US p t k AFFRICATE A ch jh SYLLABIC-CONSONANT SC el em en

Q9: What are the words in each sp ectrogram? What is the partial phonetic hyp othesis you have that leads you to the answer with the help of lexical search?

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.

kHz kHz 0 0

8 8

(^16) Zero Crossing Rate 16

dB dB

Total Energy

dB dB

Energy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.

Figure 4: Mystery word #2.

kHz kHz

Wide Band Spectrogram

kHz kHz

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.

kHz kHz

0 0

8 8

16 16 Zero Crossing Rate

dB dB

Total Energy

dB dB

Energy -- 125 Hz to 750 Hz

Waveform

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.

Figure 5: Mystery word #3.