



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Main points of this past exam are: Decision Trees, Information Gain, Linear Svm, Naïve Bayes, Markov Models, Probability, Boolean Random Variables, Negative Rate
Typology: Exams
1 / 7
This page cannot be seen from the preview
Don't miss anything!




December 20, 2010
Problem Score Max Score
Total ___________ 100
1. [6] Entropy Running from You-Know-Who, Harry enters the CS building on the 1 st^ floor. He flips a fair coin; if it is heads, he hides in room 1325; otherwise he climbs to the 2 nd^ floor. In that case he flips the coin again; if it is heads, he hides in the CSL; otherwise he climbs to the 3 rd^ floor. In that case he flips the coin yet again; if it is heads, he hides in 3331; otherwise he hides in the Men’s room. What is the entropy of Harry’s location?
L=(1/2, 1/4, 1/8, 1/8)
H(L) = -[1/2log 2 1/2 + 1/4log 2 1/4 + 1/8log 2 1/8 + 1/8log 2 1/8]
= -[1/2 * -1 + 1/4 * -2 + 1/8 * -3 + 1/8 * -3]
= 1.75 bits
2. [8] Decision Trees There are 100 parrots. They have either a red beak or a black beak. They can either talk or not. Complete the two cells in the following table so that the mutual information (i.e., information gain ) between “Beak” and “Talk” is 0. Show your work that justifies your answer.
Number of parrots Beak Talk
10 Red Yes
30 or (^15) Red No
15 or (^30) Black Yes
45 Black No
5. [15] Bayesian Networks Consider the Bayesian network A BC with three Boolean random variables and their CPTs defined by: P ( A ) = 0. P ( B | A , C ) = 0. P ( B | A , Ÿ C ) = 0. P ( B | Ÿ A , C ) = 0. P ( B | Ÿ A , Ÿ C ) = 0. P ( C ) = 0.
a. Compute P (Ÿ A , B , C )
b. Compute P (Ÿ A | Ÿ C )
P (Ÿ A | Ÿ C ) = P(ŸA) = 0.8 since A and C are independent
c. Compute P ( A | B , Ÿ C )
So, P ( A | B , Ÿ C ) = .045 / .333 = 0.
6. [6] Naïve Bayes Which one or more of the following are true statements about the conditional independence properties that are guaranteed true in a Bayesian network that is used to represent a Naïve Bayes classifier in which there are three evidence variables, W, X, and Y, and one classification variable, C.
a. P (C | W, X, Y) = P (C | W) * P (C | X) * P (C | Y) b. P (C | W, X, Y) = ( P (W, X, Y | C) * P (C)) / P (W, X, Y) c. P (W, X, Y | C) = P (W | C) * P (X | C) * P (Y | C) d. P (W, X, Y) = P (W) * P (X) * P (Y) e. P (C | W, X, Y) = P (C | W) * P (W | X) * P (X | Y) f. P (W, X, Y | C) = P (W | C) * P (X | W) * P (Y | X)
(c) is the only one that relates to conditional independence in a Bayesian network for Naïve Bayes
7. [9] Speech Recognition Traditional speech recognition can be posed as a probabilistic inference problem: given acoustic signal A , the task is to find a sentence (i.e., sequence of words) W such that W * = argmax W P ( W | A ) = argmax W P ( A | W ) P ( W ) (1)
where P ( A | W ) is the acoustic model and P ( W ) is the language model. In light of the McGurk effect, video signal V of the speaker’s face is also helpful in speech recognition. In one line, write down how you would modify equation (1) to incorporate both the acoustic and video signals for speech recognition. In another line, briefly explain the components in English.
W* = argmax (^) W P ( W | A , V ) = argmax (^) W P ( A , V | W ) P ( W ) = argmax (^) W P ( A | W ) P ( V | W ) P ( W ) assuming A and V are conditionally independent given W. This means we have the same acoustic model and language model as before, but now we add P ( V | W ) as a “video model”
8. [8] Neural Networks Fill in the two missing weights below so that the following 2-layer neural network computes A XOR B. Both A and B take values 0 or 1, and the units are Linear Threshold Units (LTUs).
w = ‐ 10
w = 1
w = 1
w = 1
(^1) w = ‐0.
(^1) w = ‐ 5
(^1) w = ‐0.
10. [15] Hidden Markov Models You sometimes get colds (C), which make you sneeze (S). You also get allergies (A), which make you sneeze. Sometimes you are well (W), which doesn’t make you sneeze (Q). You decide to model this as an HMM with hidden states C, A, W, and observable states S, Q as follows:
.
Start
1 0
0 .
.
.1.^ .8 (^) .2.
.7 (^). .6. .2.
.
a. What is the probability of the sequence W, C, C, W on days 1 to 4?
P ( q 1 =W, q 2 =C, q 3 =C, q 4 =W) = P( q 4 =W| q 3 =C)P( q 3 =C| q 2 =C)P( q 2 =C| q 1 =W)P( q 1 =W| Start ) = (.2)(.6)(.1) (1) = 0.
b. What is the probability that on day 1 you observe Q and on day 2 you observe S?
P ( o 1 =Q, o 2 =S) = P ( o 1 =Q, o 2 =S | q 1 =W, q 2 =W) P ( q 1 =W, q 2 =W)
P ( o 1 =Q, o 2 =S | q 1 =W, q 2 =A) P ( q 1 =W, q 2 =A)
P ( o 1 =Q, o 2 =S | q 1 =W, q 2 =C) P ( q 1 =W, q 2 =C)
= P ( o 1 =Q | q 1 =W) P ( o 2 =S | q 2 =W) P ( q 2 =W | q 1 =W) P ( q 1 =W) + ...
= (.9)(.1)(.7)(1) + (.9)(.8)(.2)(1) + (.9)(.7)(.1)(1)
= 0.
c. What is the probability that on day 2 you are Well? P ( q 2 =W) = P ( q 2 =W| q 1 =W) P ( q 1 =W) = (.7)(1) = 0.