Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Assignment 4 for Ling 420: Entropy Calculation and Word Sense Disambiguation, Assignments of Linguistics

This assignment for ling 420 involves calculating the entropy of non-fair 8-sided die distributions and text character distributions, as well as casting word sense disambiguation as a noisy channel model. Students are required to write a python script to compute entropy and convert text to follow shannon's assumptions.

Typology: Assignments

Pre 2010

Uploaded on 11/08/2009

koofers-user-1dm
koofers-user-1dm 🇺🇸

4.3

(3)

10 documents

1 / 1

Toggle sidebar

Related documents


Partial preview of the text

Download Assignment 4 for Ling 420: Entropy Calculation and Word Sense Disambiguation and more Assignments Linguistics in PDF only on Docsity! Assignment 4 Ling 420 Due Thursday, February 13 1. In the slides, we calculated the entropy for a fair 8-sided die (slide 9). Let’s assume that this 8-sided die is not actually fair, but instead has this distribution: 1 2 3 4 5 6 7 8 1/8 1/16 1/4 1/8 1/16 1/16 1/4 1/16 What is the entropy of this distribution? 2. (exercise 2.11, page 79) Cast the problem of word sense disambiguation as a noisy channel model, in analogy to the examples in table 2.2 [p. 71]. Word sense disambiguation is the problem of determining which sense of an ambiguous word is used (e.g., ’industrial plant’ vs. ’living plant’ for plant) and will be covered in chapter 7 [week 12]. 3. (exercise 2.9, page 78) Take a (short) piece of text and compute the rel- ative frequencies of the letters in the text. Assume these are the true probabilities. What is the entropy of this distribution? (a) Using Unix tools, convert your text such that it follows Shannon’s assumptions (p. 77), namely that English consists of 27 symbols: the 26 alphabetic characters plus the space. Record your conversions in a readme file in your homework directory. (b) Using a text of approx. 5000 characters (use Unix’s wc to count), write a python script to compute the entropy of the distribution. 1