CS446 Final Exam: Pattern Recognition and Machine Learning - Prof. Dan Roth, Exams of Computer Science

The final exam for the cs446 course on pattern recognition and machine learning. The exam covers topics such as boosting, gaussian naive bayes, support vector machines, and the expectation-maximization algorithm. Students are required to solve five problems, each worth 20 points, using the provided data sets and formulas. The exam is a closed-book exam and lasts for three hours.

Typology: Exams

Pre 2010

Uploaded on 03/16/2009

koofers-user-8hb-1
koofers-user-8hb-1 🇺🇸

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS446: Pattern Recognition and Machine Learning Fall 2008
Final Exam
December 10, 2007
This is a closed book exam. Everything you need in order to solve the problems is supplied
in the body of the exam.
The exam ends at 4:30 pm. It contains 5 problems.
You have 3 hours to earn a total of 100 points. Answer each question in the space provided.
If you need more room, write on the back side of the paper and indicate that you have done
so.
Clarity of writing is important not just having the right answer. For full credit, you must
show your work and explain your answers.
Good luck!
Name:
Problem 1 (20 points):
Problem 2 (20 points):
Problem 3 (20 points):
Problem 4 (20 points):
Problem 5 (20 points):
Total (100):
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download CS446 Final Exam: Pattern Recognition and Machine Learning - Prof. Dan Roth and more Exams Computer Science in PDF only on Docsity!

CS446: Pattern Recognition and Machine Learning Fall 2008

Final Exam

December 10, 2007

This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of the exam.

The exam ends at 4:30 pm. It contains 5 problems.

You have 3 hours to earn a total of 100 points. Answer each question in the space provided.

If you need more room, write on the back side of the paper and indicate that you have done so.

Clarity of writing is important — not just having the right answer. For full credit, you must show your work and explain your answers.

Good luck!

Name:

Problem 1 (20 points):

Problem 2 (20 points):

Problem 3 (20 points):

Problem 4 (20 points):

Problem 5 (20 points):

Total (100):

Problem 1 [Boosting - 20 points]^2

In this problem you will use Boosting to learn a hidden Boolean function from this set of examples.

i x 1 x 2 label D 0 D 1 1 1 1 + 2 0 1 - 3 0 0 - 4 1 0 - 5 1 1 + 6 0 0 - 7 1 1 + 8 0 1 - 9 0 1 - 10 1 0 -

Table 1: Table for Boosting

  1. Use two rounds of AdaBoost over x 1 and x 2 as the weak learners (rules of thumb) to learn a hypothesis for this Boolean data set. In each round, choose the weak learner that minimizes the error . Start the first round with a uniform distribution D 0. When running the algorithm, you need to compute the error  for the first and second weak learners, their weights α and the new distribution D 1 using the AdaBoost algorithm. Place the values for D 0 and D 1 in the appropriate columns of Table 1. [Please consult the formula sheet for the definitions of α and D]

Problem 2 [Gaussian Naive Bayes - 20 points]^4

  1. Consider the following training set with two real-valued inputs X 1 and X 2 and a class label Y that takes two values, A and B.

X 1 X 2 Y 0 5 A 2 9 A 1 3 B 2 4 B 3 5 B 4 6 B 5 7 B

Table 2: Dataset

We assume that the data is generated by a Gaussian naive Bayes model, and we will use the data to develop a naive Bayes predictor. First, compute the following probabilities, means and standard deviations. Write your answers in the table below:

P (Y = A) = P (Y = B) = μ 1 A = μ 1 B = σ^21 A = σ 12 B = μ 2 A = μ 2 B = σ^22 A = σ 22 B =

Table 3: Parameters

Answer (formulas): Gaussian naive bayes means that for each possible value of Y , there is a seperate Gaussian distribution for each Xi. Therefor:

μiy = E[xi|Y = y] σ iy^2 = E[(xi − μiy)^2 |Y = y]

  1. Recall that the model is a Gaussian naive Bayes. Compute:^5 P (X 1 = 1, X 2 = 3|Y = A) =

Answer: Naive bayes: P (X 1 = 1, X 2 = 3|Y = A) = P (X 1 = 1|Y = A) ∗ P (X 2 = 3|Y = A) and

Gaussian: P (Xi = x|Y = A) = (^) σ^1 iA

√ 2 π e

− (x−μiA) 2 2 σ iA^2

  1. Recall that the model is a Gaussian naive Bayes. Compute: P (X 1 = 1, X 2 = 3) =

Answer: Marginilize over the Y to get: P (X 1 = 1, X 2 = 3) = P (Y = A) ∗ P (X 1 = 1, X 2 = 3 |Y = A) + P (Y = B) ∗ P (X 1 = 1, X 2 = 3|Y = B) And solve as above.

Problem 3 [Support Vector Machine - 20 points]^7

Consider the following dataset. In this problem we will use this data to learn a linear SVM of the form f (x) = sign(w 1 x 1 + w 2 x 2 + w 3 ), with ||w|| = 1.

0

2

4

-4 -2 0 2 4

x 1 x 2 label 3 -1 + 2 0 + 1 1 + 0 2 + 0 0 - 0 -4 - -4 0 -

  1. What is that value of w that the SVM algorithm will output on the given data? (Make sure that you return w such that ||w|| = 1).

Answer: w = ( √^13 , √^13 , −√^13 )

  1. What is the training set error of the above example (expressed as the percentage of^8 training points misclassified)?

Answer: 0%, all correctly classified.

  1. Compute the 7-fold cross validation of SVM on the data set given. Explain your reasoning.

Answer: 6/7 correct. The only support vector that changes location of hyperplane is (0,0)

  1. Express the probability P (x, z) of an observation (x, z) in terms of the unknown pa-^10 rameters.

Answer:

P (x, z) = P (Y = 1)P (x, z|y = 1) + P (Y = 0)P (x, z|y = 0) = P (Y = 1)P (z|x, y = 1)P (x) + P (Y = 0)P (z|x, y = 0)P (x)

...

  1. Let yij = P (Y = i|x(j), z(j)) be the probability that the value of Y on the data point (xj^ , zj^ ) is i. Express y 0 j in terms of the unknown parameters.

Answer:

P (Y = i|x(j), z(j)) =

P (x(j), z(j)|Y = i) ∗ P (Y = i) P (x(j), z(j))

=

P (z(j)|x(j), Y = i)P (x(j))P (Y = i) P (z(j)|x(j), Y = 1)P (x(j))P (Y = 1) + P (z(j)|x(j), Y = 0)P (x(j))P (Y = 0)

=

P (z(j)|x(j), Y = i) ∗ P (Y = i) P (z(j)|x(j), Y = 1) ∗ P (Y = 1) + P (z(j)|x(j), Y = 0) ∗ P (Y = 0)

r

  1. Derive an expression for the expected log likelihood (LL) of the entire data set^11 {(x(j), z(j))}j=1,m given new parameters estimates: {α,˜ β,˜ ˜λ 11 , ˜λ 10 , ˜λ 01 , ˜λ 00 }

Answer:

E[LL] = E

  ∑^ m

j=

ln(P (x(j), z(j)))

 

∑^ m

j=

yj 0 ln(P (Y = 0)P (x(j), z(j)|Y = 0)) + y 1 j ln(P (Y = 1)P (x(j), z(j)|Y = 1))

  1. Use the expression derived in (c) to determine the update rules for unknown parame- ters.

(b) We define a class Cr,k,n of r−of−k functions in the following way. Let X = { 0 , 1 }^13 n. For a chosen set of k relevant variables and a given number r, an r−of−k function f (x 1 ,... xn) is 1 if and only if at least r of the k relevant variables are 1. We assume that 1 ≤ r ≤ k ≤ n.

(1) Phrase this problem as a problem of learning a Boolean disjunction over some feature space. Define the feature space and the learning problem. (2) Assume you are learning this function using Winnow. What mistake bound do you obtain?

Answer:

  1. Use feature space of r-conjunctions over original variables. Now the final hypothesis is the disjunction of the

(k r

) features representing r-conjunctions of the k relevant variables.

  1. Winnow makes O(k′^ log n′) mistakes, k′^ =

( k r

) , n′^ =

( n r

) .

Some formulas you may need:^14

1. P (A, B) = P (A|B)P (B)

  1. Entropy(S) = −p+ log p+ − p− log p−
  2. Gain(S, A) = Entropy(S) −

v∈V alues(A)

|Sv| |S|

Entropy(Sv)

  1. Gaussian distribution:
    • P (x) = (^) σ√^12 π e−^

(x−μ)^2 2 σ^2

  • σ^2 =

n

∑^ n

i=

(xi − μ)^2

  1. M = O(min{ (^1)  (ln |H| + ln 1/δ), (^1)  (V C(H) + ln 1/δ)}

AdaBoost

  1. αt = 12 log 2 1 −  ,  = error
  2. Dt+1(i) =

{ (^) Dt Zt 2

−αt (^) if ht(xi) = yi Dt Zt 2

αt (^) if ht(xi) 6 = yi^ ,^ Zt^ is a normalization factor