Part of Speech Tagging & Sense Disambiguation in Speech Synthesis: ECE 598 Deep Dive - Pro, Study Guides, Projects, Research of Electrical and Electronics Engineering

An in-depth exploration of part-of-speech tagging and sense disambiguation in speech synthesis, as covered in the ece 598: speech synthesis course at the university of illinois at urbana-champaign. The challenges of part-of-speech tagging, particularly in languages with minimal morphology, and the importance of tagging for speech synthesis. It also delves into sense disambiguation, using the french language as an example, and the decision-list approach for disambiguation. The document also touches upon other topics related to speech synthesis, such as word pronunciation, abbreviation expansion, and language modeling.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 03/16/2009

koofers-user-onr
koofers-user-onr 🇺🇸

10 documents

1 / 109

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECE 598: Speech Synthesis
Linguistic Analysis
Richard Sproat
http://www.linguistics.uiuc.edu/rws/
URL for this course:
http://catarina.ai.uiuc.edu/ECE598/
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Part of Speech Tagging & Sense Disambiguation in Speech Synthesis: ECE 598 Deep Dive - Pro and more Study Guides, Projects, Research Electrical and Electronics Engineering in PDF only on Docsity!

ECE 598: Speech Synthesis

Linguistic Analysis

Richard Sproat http://www.linguistics.uiuc.edu/rws/ URL for this course: http://catarina.ai.uiuc.edu/ECE598/

Synopsis

  • Problems

? Part of speech tagging ? Word-sense disambiguation ? Word pronunciation ? Preprocessing: Abbreviation expansion, etc.

  • Multilingual issues

? Word segmentation in Asian languages ? Architectures for multilingual linguistic analysis

ECE 598: Linguistic Analysis

Part of Speech Tags

  • Part of speech (POS) tagging is simply the problem of placing words into equivalence classes.
  • Notion of part of speech tags can be attributed to Dionysius Thrax, 1st Century BC Greek grammarian who classified Greek words into eight classes: noun, verb, pronoun, preposition, adverb, conjunction, participle and article.
  • Tagging is arguably easiest in languages with rich (inflectional) morphology (e.g. Spanish) for two reasons:

? It’s more obvious what the basic set of tags should be since words fall into ? The morphology gives important cues to what the part of speech is: cantaremos is highly likely to be a verb given the ending -ar-emos.

  • It’s arguably hardest in languages with minimal (inflectional) morphology:

? there are fewer cues in English than there are in Spanish ? for some languages like Chinese, cues are almost completely absent and linguists can’t even agree on whether (e.g.) Chinese distinguishes verbs from adjectives.

Part of Speech Tags

  • Linguists typically distinguish a relatively small set of basic categories (like Dionysius Thrax)—sometimes just 4 in the case of Chomsky’s [±N,±V] proposal.

But usually these analyses assume an additional set of morphosyntactic features.

  • Computational models of tagging usually involve a larger set, which in many cases can be thought of as the linguists’ small set, plus the features squished into one term:

eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN

  • Tagset size has a clear affect on performance of taggers.

“the Penn Treebank project collapsed many tags compared to the original Brown tagset, and got better results.” (http://www.ilc.cnr.it/EAGLES96/ morphsyn/node18.html)

But choosing the right size tagset depends upon the intended application.

As far as I know, there is no demonstration of what is the “optimal” tagset.

  • http://www.scs.leeds.ac.uk/ccalas/tagsets/brown.html
  • Motivations for the Penn tagset modifications

? “the Penn Treebank tagset is based on that of the Brown Corpus. However the stochastic orientation of the Penn Treebank and the resulting concern with sparse data led us to modify the Brown tagset by paring it down considerably” (Marcus, Santorini and Marcinkiewicz, 1993). ? eliminated distinctions that were lexically recoverable: thus no separate tags for be, do, have. ? as well as distinctions that were syntactically recoverable (e.g. the distinction between subject and object pronouns)

Problematic Cases

Even with a well-designed tagset, there are cases that even experts find it difficult to agree on.

  • adjective or participle? a seen event, a rarely seen event, an unseen event,
  • a child seat, *a very child seat, *this seat is child but: that’s a very MIT paper, she’s sooooooo California
  • preposition or particle? he threw out the garbage he threw the garbage out he threw the garbage out the door ∗he threw the garbage the door out

How Hard is Tagging?

  • Many words are unambiguous. From the Brown corpus:

tags # types with that many tags

1 35, 2 3, 3 264 4 61 5 12 6 2 7 1 “still”

  • Baseline for English (Penn tagset) something like 91%.

Approaches to Automatic Tagging: Hand-Written Rules

ENGTWOL (Voutilainen, 1995) is an FST-based rule-based system for English tagging.

Example rule:

Adverbial that rule: Given input “that”: if:

(+1 A/ADV/QUANT) /* next word is adj, adv. or quant / (+2 SENT-LIM) / following is sentence boundary / (NOT -1 SVOC/A) / prev word not adj comp verb */

then eliminate non-ADV tags else eliminate ADV tag

Approaches to Automatic Tagging: Source-Channel Model

  • Basic problem: uncover the underlying signal of POS tags as modified by the noisy channel that produces observable words from tags.
  • For a bigram tagger this would give you the formula for the ith tag:

ti = argmaxjP (tj|ti− 1 )P (wi|tj)

  • For the whole sentence then we want to maximize:

j

P (tj|tj− 1 )P (wj|tj)

  • Note that this can also be derived via Bayes’ formula for a tag sequence T and word sequence W. Thus we want to maximize:

P (T |W )

which is given by

P (T |W ) =

P (T )P (W |T )

P (W )

But since we know the word sequence we can eliminate that and just maximize

P (T )P (W |T )

  • What do you do if you don’t have tagged data?

You can assume an initial distribution of tags over the corpus (given a dictionary and perhaps some lingustically base guesses) and then use an algorithm such as expectation maximization (EM).

Approaches to Automatic Tagging: Transformation-Based

Learning

TBL was proposed by Eric Brill in his 1995 U Penn dissertation. It is a “weakly statistical” method

  • The system starts with a set of nominal assignments based on most likely tag. (Recall that this will be right 9 times out of 10.)
  • Then the system proceeds to learn rules of the form: “change X into Y if preceded/followed by Z”.

Thus: Change NN to VB when the previous tag is TO

So: expected/VBD to/TO race/NN → expected/VBD to/TO race/VB

The rulespace is searched for the rule that gives the most improvement given the corpus.