Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS446 Final Exam: Pattern Recognition and Machine Learning - Prof. Dan Roth, Exams of Computer Science

University of Illinois - Urbana-Champaign Computer Science

Prof. Dan Roth

The final exam for the cs446 course on pattern recognition and machine learning. The exam covers topics such as boosting, gaussian naive bayes, support vector machines, and the expectation-maximization algorithm. Students are required to solve five problems, each worth 20 points, using the provided data sets and formulas. The exam is a closed-book exam and lasts for three hours.

Typology: Exams

Pre 2010

Uploaded on 03/16/2009

koofers-user-8hb-1 🇺🇸

10 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

CS446: Pattern Recognition and Machine Learning Fall 2008

Final Exam

December 10, 2007

This is a closed book exam. Everything you need in order to solve the problems is supplied

in the body of the exam.

The exam ends at 4:30 pm. It contains 5 problems.

You have 3 hours to earn a total of 100 points. Answer each question in the space provided.

If you need more room, write on the back side of the paper and indicate that you have done

so.

Clarity of writing is important — not just having the right answer. For full credit, you must

show your work and explain your answers.

Good luck!

Name:

Problem 1 (20 points):

Problem 2 (20 points):

Problem 3 (20 points):

Problem 4 (20 points):

Problem 5 (20 points):

Total (100):

Discover Exams of Computer Science University of Illinois - Urbana-Champaign

Partial preview of the text

Download CS446 Final Exam: Pattern Recognition and Machine Learning - Prof. Dan Roth and more Exams Computer Science in PDF only on Docsity!

CS446: Pattern Recognition and Machine Learning Fall 2008

Final Exam

December 10, 2007

This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of the exam.

The exam ends at 4:30 pm. It contains 5 problems.

You have 3 hours to earn a total of 100 points. Answer each question in the space provided.

If you need more room, write on the back side of the paper and indicate that you have done so.

Clarity of writing is important — not just having the right answer. For full credit, you must show your work and explain your answers.

Good luck!

Name:

Problem 1 (20 points):

Problem 2 (20 points):

Problem 3 (20 points):

Problem 4 (20 points):

Problem 5 (20 points):

Total (100):

Problem 1 [Boosting - 20 points]^2

In this problem you will use Boosting to learn a hidden Boolean function from this set of examples.

i x 1 x 2 label D 0 D 1 1 1 1 + 2 0 1 - 3 0 0 - 4 1 0 - 5 1 1 + 6 0 0 - 7 1 1 + 8 0 1 - 9 0 1 - 10 1 0 -

Table 1: Table for Boosting

Use two rounds of AdaBoost over x 1 and x 2 as the weak learners (rules of thumb) to learn a hypothesis for this Boolean data set. In each round, choose the weak learner that minimizes the error . Start the first round with a uniform distribution D 0. When running the algorithm, you need to compute the error for the first and second weak learners, their weights α and the new distribution D 1 using the AdaBoost algorithm. Place the values for D 0 and D 1 in the appropriate columns of Table 1. [Please consult the formula sheet for the definitions of α and D]

Problem 2 [Gaussian Naive Bayes - 20 points]^4

Consider the following training set with two real-valued inputs X 1 and X 2 and a class label Y that takes two values, A and B.

X 1 X 2 Y 0 5 A 2 9 A 1 3 B 2 4 B 3 5 B 4 6 B 5 7 B

Table 2: Dataset

We assume that the data is generated by a Gaussian naive Bayes model, and we will use the data to develop a naive Bayes predictor. First, compute the following probabilities, means and standard deviations. Write your answers in the table below:

P (Y = A) = P (Y = B) = μ 1 A = μ 1 B = σ^21 A = σ 12 B = μ 2 A = μ 2 B = σ^22 A = σ 22 B =

Table 3: Parameters

Answer (formulas): Gaussian naive bayes means that for each possible value of Y , there is a seperate Gaussian distribution for each Xi. Therefor:

μiy = E[xi|Y = y] σ iy^2 = E[(xi − μiy)^2 |Y = y]

Recall that the model is a Gaussian naive Bayes. Compute:^5 P (X 1 = 1, X 2 = 3|Y = A) =

Answer: Naive bayes: P (X 1 = 1, X 2 = 3|Y = A) = P (X 1 = 1|Y = A) ∗ P (X 2 = 3|Y = A) and

Gaussian: P (Xi = x|Y = A) = (^) σ^1 iA

√ 2 π e

− (x−μiA) 2 2 σ iA^2

Recall that the model is a Gaussian naive Bayes. Compute: P (X 1 = 1, X 2 = 3) =

Answer: Marginilize over the Y to get: P (X 1 = 1, X 2 = 3) = P (Y = A) ∗ P (X 1 = 1, X 2 = 3 |Y = A) + P (Y = B) ∗ P (X 1 = 1, X 2 = 3|Y = B) And solve as above.

Problem 3 [Support Vector Machine - 20 points]^7

Consider the following dataset. In this problem we will use this data to learn a linear SVM of the form f (x) = sign(w 1 x 1 + w 2 x 2 + w 3 ), with ||w|| = 1.

-4 -2 0 2 4

x 1 x 2 label 3 -1 + 2 0 + 1 1 + 0 2 + 0 0 - 0 -4 - -4 0 -

What is that value of w that the SVM algorithm will output on the given data? (Make sure that you return w such that ||w|| = 1).

Answer: w = ( √^13 , √^13 , −√^13 )

What is the training set error of the above example (expressed as the percentage of^8 training points misclassified)?

Answer: 0%, all correctly classified.

Compute the 7-fold cross validation of SVM on the data set given. Explain your reasoning.

Answer: 6/7 correct. The only support vector that changes location of hyperplane is (0,0)

Express the probability P (x, z) of an observation (x, z) in terms of the unknown pa-^10 rameters.

Answer:

P (x, z) = P (Y = 1)P (x, z|y = 1) + P (Y = 0)P (x, z|y = 0) = P (Y = 1)P (z|x, y = 1)P (x) + P (Y = 0)P (z|x, y = 0)P (x)

...

Let yij = P (Y = i|x(j), z(j)) be the probability that the value of Y on the data point (xj^ , zj^ ) is i. Express y 0 j in terms of the unknown parameters.

Answer:

P (Y = i|x(j), z(j)) =

P (x(j), z(j)|Y = i) ∗ P (Y = i) P (x(j), z(j))

P (z(j)|x(j), Y = i)P (x(j))P (Y = i) P (z(j)|x(j), Y = 1)P (x(j))P (Y = 1) + P (z(j)|x(j), Y = 0)P (x(j))P (Y = 0)

P (z(j)|x(j), Y = i) ∗ P (Y = i) P (z(j)|x(j), Y = 1) ∗ P (Y = 1) + P (z(j)|x(j), Y = 0) ∗ P (Y = 0)

Derive an expression for the expected log likelihood (LL) of the entire data set^11 {(x(j), z(j))}j=1,m given new parameters estimates: {α,˜ β,˜ ˜λ 11 , ˜λ 10 , ˜λ 01 , ˜λ 00 }

Answer:

E[LL] = E

  ∑^ m

ln(P (x(j), z(j)))

 

∑^ m

yj 0 ln(P (Y = 0)P (x(j), z(j)|Y = 0)) + y 1 j ln(P (Y = 1)P (x(j), z(j)|Y = 1))

Use the expression derived in (c) to determine the update rules for unknown parame- ters.

(b) We define a class Cr,k,n of r−of−k functions in the following way. Let X = { 0 , 1 }^13 n. For a chosen set of k relevant variables and a given number r, an r−of−k function f (x 1 ,... xn) is 1 if and only if at least r of the k relevant variables are 1. We assume that 1 ≤ r ≤ k ≤ n.

(1) Phrase this problem as a problem of learning a Boolean disjunction over some feature space. Define the feature space and the learning problem. (2) Assume you are learning this function using Winnow. What mistake bound do you obtain?

Answer:

Use feature space of r-conjunctions over original variables. Now the final hypothesis is the disjunction of the

(k r

) features representing r-conjunctions of the k relevant variables.

Winnow makes O(k′^ log n′) mistakes, k′^ =

( k r

) , n′^ =

( n r

) .

Some formulas you may need:^14

1. P (A, B) = P (A|B)P (B)

Entropy(S) = −p+ log p+ − p− log p−
Gain(S, A) = Entropy(S) −

∑

v∈V alues(A)

|Sv| |S|

Entropy(Sv)

Gaussian distribution:
- P (x) = (^) σ√^12 π e−^

(x−μ)^2 2 σ^2

σ^2 =

∑^ n

(xi − μ)^2

M = O(min{ (^1) (ln |H| + ln 1/δ), (^1) (V C(H) + ln 1/δ)}

AdaBoost

αt = 12 log 2 1 − , = error
Dt+1(i) =

{ (^) Dt Zt 2

−αt (^) if ht(xi) = yi Dt Zt 2

αt (^) if ht(xi) 6 = yi^ ,^ Zt^ is a normalization factor

CS446 Final Exam: Pattern Recognition and Machine Learning - Prof. Dan Roth, Exams of Computer Science

Related documents

Partial preview of the text

Download CS446 Final Exam: Pattern Recognition and Machine Learning - Prof. Dan Roth and more Exams Computer Science in PDF only on Docsity!

Final Exam

E[LL] = E

1. P (A, B) = P (A|B)P (B)