Midterm Exam Question with Solution - Machine Learning | ECS 271, Exams of Computer Science

Material Type: Exam; Class: Machine Learning; Subject: Engineering Computer Science; University: University of California - Davis; Term: Spring 2004;

Typology: Exams

Pre 2010

Uploaded on 07/31/2009

koofers-user-763
koofers-user-763 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Your name: __________________
Your ID:_____________________
UC-Davis
ECS 271 Midterm Examination
Closed book
Spring 2004
Show all work clearly and legibly. Remember, you are being tested. So even if an
answer is obvious to you, pl. show all the justification by clearly showing the
calculations, or explaining why a calculation is skipped.
1. True or False? ( 9 points each)
(a) In PAC learning model, the learner makes no assumptions aboutthe
class from which the target concept is drawn. (False)
(b) In PAC learning, the learner outputs the hypothesis from H that has
theleast error (possibly zero) over the training data (False)
(c) The numberof training examples required for successful learning is
strongly influenced by the complexity of the hypothesis space
considered by the learner. (True)
2. (15 points) Illustrate your understanding of the back propagation method by
explicitly showing all steps of the calculations with respect to a single-neuron
with a sigmoidal nonlinearity. Assume that you are at the output stage of the
network. The objective is for the unit to learn a single input pattern, namely
1
2
1
4
i
i
i
The desired output is o = 1. Initially assume
1 2 0w w
. Use a learning rate
1.0
. Show all the calculations for two iterations. Show the weight values at
the end of the first and second iterations. In what direction is the weight
vector moving from iteration to iteration?
pf3
pf4
pf5

Partial preview of the text

Download Midterm Exam Question with Solution - Machine Learning | ECS 271 and more Exams Computer Science in PDF only on Docsity!

Your name: __________________

Your ID:_____________________

UC-Davis

ECS 271 Midterm Examination

Closed book

Spring 2004

Show all work clearly and legibly. Remember, you are being tested. So even if an

answer is obvious to you, pl. show all the justification by clearly showing the

calculations, or explaining why a calculation is skipped.

1. True or False? ( 9 points each)

(a) In PAC learning model, the learner makes no assumptions aboutthe

class from which the target concept is drawn. (False)

(b) In PAC learning, the learner outputs the hypothesis from H that has

theleast error (possibly zero) over the training data (False)

(c) The numberof training examples required for successful learning is

strongly influenced by the complexity of the hypothesis space

considered by the learner. (True)

2. (15 points) Illustrate your understanding of the back propagation method by

explicitly showing all steps of the calculations with respect to a single-neuron

with a sigmoidal nonlinearity. Assume that you are at the output stage of the

network. The objective is for the unit to learn a single input pattern, namely

1

2

i

i

i

The desired output is o = 1. Initially assume 1 2

ww  (^0). Use a learning rate

  1.0 (^). Show all the calculations for two iterations****. Show the weight values at

the end of the first and second iterations. In what direction is the weight

vector moving from iteration to iteration?

Solution:

1st iteration: netinput = 0. output = 1/2, error = (1-0.5)**2 = 0.

delta-w1 = etadeloutput = 1.0*(0.5)(0.5)(1-0.5) i1 = 0.125 i1 = 0.

delta-w2 = etadeloutput = 1.0*(0.5)(0.5)(1-0.5) i1 = 0.125 i2 = 0.

new weights are 0.125 and 0.

2nd iteration

2nd iteration: netinput = 2.125. output = 1 + exp (-2.125) = 0.893, error =

delta-w1 = etadeloutput = 1.0*(1-0.893)(0.893)(1-0.893) i1 = 0.0853 i1 =

delta-w2 = etadeloutput = 0.0853 i2 =0.

new weights are 0.210 and 0.

The weight vector is moving toward the input vector.

3. (8 points) Suppose H is a set of possible hypotheses and D is a set of training

data. We would like our program to output the most probable hypothesis h

from H, given the data D. Under what conditions does the following hold?

arg max P H ( | D ) arg max P D ( | H );

hH hH

Solution: First, there is a typo. The H should be h under arg max. But this is a

minor thing and it did not bother any of you. So let us proceed.

The starting formula is

( | ) ( )

arg max ( | ) arg max

( )

P D h P h

P h D

P D

hH hH

P(D) can be dropped because it does not depend on h

P(h) can be treated as a constant if all the hypothesesin the hypothesis space are equally

likely.

Under these conditons, both sides are equal as stated int he question.

5. (a) (12 points) Build a decision tree to classify the following patterns. Show

all the calculations systematically or explain why certain calculations are

skipped.

Pattern

(x1,x2,x3)

Class

(b) (2 points) What Boolean function is the above tree implementing?

Solution:

A plot of the 8 points along x1, x2 and x3 gives an idea on how to solve this.

The initail uncertainity of all 8 points is

-(6/8) log2 (6/8) – (2/8) log2 (2/8) = 0.

Suppose we divide the points by drawing a plane along the x1- axis (i. e., parallel to the

x2-x3 plane. Then the left-branch has 4 points all belonging to the same class and the

right hand branch has two of each class. So the uncertainity of the left branch is

-(4/4) log2 (4/4) – (0/4) log2 (0/4) = 0

The uncertainity of the right branch is

-(2/4) log2 (2/4) – (2/4) log2 (2/4) = 1

Average uncertainity after the first test (on x1) is

Uncertainity reduction achieved is 0.81 – 0.5 = 0.

Do a similar thing along x2 and x3 and find out that test along x3 gives exactly the same

uncertainity and a test along x2 gives no improvement at all. So first choose either x1 or

x2.

The decision tree really implements f = x1x3.

(c) ( 5 points) Consider a decision tree built from an arbitrary set of data. If the

output is discreet-valued and can take on k different possible values, what is the

maximum training set error (expressed as a fraction) that any data set could

possibly have?

Suggested Solution: The answer is (k-1)/k. Consider data sets with identical inputs but

the outputs are evenly distributed among k classes. Then we will always get one correct

classification and (k-1) erroneous classifications.

6. (12 points) Imagine that you are given the following set of training examples.

All the features are Boolean-valued.

F1 F2 F3 Result

T T F +

F T T +

T F T -

F T F -

F F T -

How would a Naive Bayes approach classify the following test example?

Be sure to show your work.

F1 = T F2 = F F3 = F

Solution: There are only two possible answers + and -. So it is possible that you can toss a coin

and guess the answer and be on the correct side 50% of the time. Therefore, it becomes imporatnt

that you show all calculations and they be correct too, to justify your answer.

Furthermore, one of the probability terms is zero. This makes it doubly dangerous because you

can getthe correct classification despite a horde of calculation errors.

From the historical data given to you, P(+) = 2/5 = 0.4 and P(-) = 3/5 = 0.

You simply have to calculate arg max P(vj) P(F1=T|vj) P(F2=F|vj) P(F3=F|vj); Naive Bayes

assumption. Note vj can assume only two values + and -.

P(vj = +)* P(F1| +) P(F2| +) P(F3 | +)

P(vj = -)* P(F1| -) P(F2| -) P(F3 | -)

P(F1|+) = P(F1=T|+) = 1/2 = 0.

P(F2|+) = P(F2=F|+) = 0/2 = 0.

P(F3|+) = P(F3=F|+) = 1/2 = 0.