








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The final exam for the cs446 course on pattern recognition and machine learning. The exam covers topics such as boosting, gaussian naive bayes, support vector machines, and the expectation-maximization algorithm. Students are required to solve five problems, each worth 20 points, using the provided data sets and formulas. The exam is a closed-book exam and lasts for three hours.
Typology: Exams
1 / 14
This page cannot be seen from the preview
Don't miss anything!









CS446: Pattern Recognition and Machine Learning Fall 2008
December 10, 2007
This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of the exam.
The exam ends at 4:30 pm. It contains 5 problems.
You have 3 hours to earn a total of 100 points. Answer each question in the space provided.
If you need more room, write on the back side of the paper and indicate that you have done so.
Clarity of writing is important — not just having the right answer. For full credit, you must show your work and explain your answers.
Good luck!
Name:
Problem 1 (20 points):
Problem 2 (20 points):
Problem 3 (20 points):
Problem 4 (20 points):
Problem 5 (20 points):
Total (100):
Problem 1 [Boosting - 20 points]^2
In this problem you will use Boosting to learn a hidden Boolean function from this set of examples.
i x 1 x 2 label D 0 D 1 1 1 1 + 2 0 1 - 3 0 0 - 4 1 0 - 5 1 1 + 6 0 0 - 7 1 1 + 8 0 1 - 9 0 1 - 10 1 0 -
Table 1: Table for Boosting
Problem 2 [Gaussian Naive Bayes - 20 points]^4
X 1 X 2 Y 0 5 A 2 9 A 1 3 B 2 4 B 3 5 B 4 6 B 5 7 B
Table 2: Dataset
We assume that the data is generated by a Gaussian naive Bayes model, and we will use the data to develop a naive Bayes predictor. First, compute the following probabilities, means and standard deviations. Write your answers in the table below:
P (Y = A) = P (Y = B) = μ 1 A = μ 1 B = σ^21 A = σ 12 B = μ 2 A = μ 2 B = σ^22 A = σ 22 B =
Table 3: Parameters
Answer (formulas): Gaussian naive bayes means that for each possible value of Y , there is a seperate Gaussian distribution for each Xi. Therefor:
μiy = E[xi|Y = y] σ iy^2 = E[(xi − μiy)^2 |Y = y]
Answer: Naive bayes: P (X 1 = 1, X 2 = 3|Y = A) = P (X 1 = 1|Y = A) ∗ P (X 2 = 3|Y = A) and
Gaussian: P (Xi = x|Y = A) = (^) σ^1 iA
√ 2 π e
− (x−μiA) 2 2 σ iA^2
Answer: Marginilize over the Y to get: P (X 1 = 1, X 2 = 3) = P (Y = A) ∗ P (X 1 = 1, X 2 = 3 |Y = A) + P (Y = B) ∗ P (X 1 = 1, X 2 = 3|Y = B) And solve as above.
Problem 3 [Support Vector Machine - 20 points]^7
Consider the following dataset. In this problem we will use this data to learn a linear SVM of the form f (x) = sign(w 1 x 1 + w 2 x 2 + w 3 ), with ||w|| = 1.
0
2
4
-4 -2 0 2 4
x 1 x 2 label 3 -1 + 2 0 + 1 1 + 0 2 + 0 0 - 0 -4 - -4 0 -
Answer: w = ( √^13 , √^13 , −√^13 )
Answer: 0%, all correctly classified.
Answer: 6/7 correct. The only support vector that changes location of hyperplane is (0,0)
Answer:
P (x, z) = P (Y = 1)P (x, z|y = 1) + P (Y = 0)P (x, z|y = 0) = P (Y = 1)P (z|x, y = 1)P (x) + P (Y = 0)P (z|x, y = 0)P (x)
...
Answer:
P (Y = i|x(j), z(j)) =
P (x(j), z(j)|Y = i) ∗ P (Y = i) P (x(j), z(j))
=
P (z(j)|x(j), Y = i)P (x(j))P (Y = i) P (z(j)|x(j), Y = 1)P (x(j))P (Y = 1) + P (z(j)|x(j), Y = 0)P (x(j))P (Y = 0)
=
P (z(j)|x(j), Y = i) ∗ P (Y = i) P (z(j)|x(j), Y = 1) ∗ P (Y = 1) + P (z(j)|x(j), Y = 0) ∗ P (Y = 0)
r
Answer:
∑^ m
j=
ln(P (x(j), z(j)))
∑^ m
j=
yj 0 ln(P (Y = 0)P (x(j), z(j)|Y = 0)) + y 1 j ln(P (Y = 1)P (x(j), z(j)|Y = 1))
(b) We define a class Cr,k,n of r−of−k functions in the following way. Let X = { 0 , 1 }^13 n. For a chosen set of k relevant variables and a given number r, an r−of−k function f (x 1 ,... xn) is 1 if and only if at least r of the k relevant variables are 1. We assume that 1 ≤ r ≤ k ≤ n.
(1) Phrase this problem as a problem of learning a Boolean disjunction over some feature space. Define the feature space and the learning problem. (2) Assume you are learning this function using Winnow. What mistake bound do you obtain?
Answer:
(k r
) features representing r-conjunctions of the k relevant variables.
( k r
) , n′^ =
( n r
) .
Some formulas you may need:^14
∑
v∈V alues(A)
|Sv| |S|
Entropy(Sv)
(x−μ)^2 2 σ^2
n
∑^ n
i=
(xi − μ)^2
AdaBoost
{ (^) Dt Zt 2
−αt (^) if ht(xi) = yi Dt Zt 2
αt (^) if ht(xi) 6 = yi^ ,^ Zt^ is a normalization factor