Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

assignment 1 of machine learning, Assignments of Machine Learning

Texas A&M University (A&M)Machine Learning

assignment 1 of csce machine learning

Typology: Assignments

2021/2022

Uploaded on 10/10/2022

huixin-zhang-1103 🇺🇸

5

(1)

4 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Assignment)1)

1. Let&x&=&(x1,&x2,&x3)&describe&a&patient’s&medical&record,&where&each&xi&∈&{0,&1}.&(For&example,&

each&xi&could&record&the&binary&outcome&of&a&certain&lab&test).&The&label&y&indicates&the&

presence&or&absence&of&a&certain&disease,&with&y&∈&{0,&1}.&Suppose&the&prior&distribution&

of&Y&is&P(Y&=&0)&=&0.8,&and&P(Y&=&1)&=&0.2,&and&the&likelihood&is&

𝑃

(

𝑋=

(

𝑥!,𝑥",𝑥#

)

|𝑌=

𝑦

)

=

∏

(0.5𝑦+0.25)$!(0.75−0.5𝑦)!%$!

#

&'!

.&If&h&is&the&predicted&outcome&while&the&truth&

is&y,&the&loss&l(h,&y)&incurred&is&given&by&l(0,&0)&=&0,&l(0,&1)&=&+1000,&l(1,&0)&=&+100,&and&

l(1,&1)&=&−500.&

(i)&Determine&the&Bayes&Optimal&hypothesis&hBayes−Optimal.&

(ii)&What&is&the&Bayes&Optimal&Risk?&

Prior&distribution:&P(Y&=&0)&=&0.8,&P(Y&=&1)&=&0.2.&

Likelihood:&

𝑃

(

𝑋=

(

𝑥!,𝑥",𝑥#

)

|𝑌=𝑦

)

=

∏

(0.5𝑦+0.25)$!(0.75−0.5𝑦)!%$!

#

&'!

.&

Loss&l(h,&y):&l(0,&0)&=&0,&l(0,&1)&=&+1000,&l(1,&0)&=&+100,&l(1,&1)&=&−500.&

1) Determine&the&Bayes&Optimal&hypothesis.&

From&the&likelihood&we&know:&

P(X&=&(0,&0,&0)|Y&=&0)&=&0.2500.751&*&0.2500.751&*&0.2500.751&=&0.421875&

P(X&=&(1,&0,&0)|Y&=&0)&=&0.2510.750&*&0.2500.751&*&0.2500.751&=&0.140625&

P(X&=&(0,&1,&0)|Y&=&0)&=&0.2500.751&*&0.2510.750&*&0.2500.751&=&0.140625&

P(X&=&(0,&0,&1)|Y&=&0)&=&0.2500.751&*&0.2500.751&*&0.2510.750&=&0.140625&

P(X&=&(0,&1,&1)|Y&=&0)&=&0.2500.751&*&0.2510.750&*&0.2510.750&=&0.046875&

P(X&=&(1,&0,&1)|Y&=&0)&=&0.2510.750&*&0.2500.751&*&0.2510.750&=&0.046875&

P(X&=&(1,&1,&0)|Y&=&0)&=&0.2510.750&*&0.2510.750&*&0.2500.751&=&0.046875&

P(X&=&(1,&1,&1)|Y&=&0)&=&0.2510.750&*&0.2510.750&*&0.2510.750&=&0.015625&

P(X&=&(0,&0,&0)|Y&=&1)&=&0.7500.251&*&0.7500.251&*&0.7500.251&=&0.015625&

P(X&=&(1,&0,&0)|Y&=&1)&=&0.7510.250&*&0.7500.251&*&0.7500.251&=&0.046875&

P(X&=&(0,&1,&0)|Y&=&1)&=&0.7500.251&*&0.7510.250&*&0.7500.251&=&0.046875&

P(X&=&(0,&0,&1)|Y&=&1)&=&0.7500.251&*&0.7500.251&*&0.7510.250&=&0.046875&

P(X&=&(0,&1,&1)|Y&=&1)&=&0.7500.251&*&0.7510.250&*&0.7510.250&=&0.140625&

P(X&=&(1,&0,&1)|Y&=&1)&=&0.7510.250&*&0.7500.251&*&0.7510.250&=&0.140625&

P(X&=&(1,&1,&0)|Y&=&1)&=&0.7510.250&*&0.7510.250&*&0.7500.251&=&0.140625&

P(X&=&(1,&1,&1)|Y&=&1)&=&0.7510.250&*&0.7510.250&*&0.7510.250&=&0.421875&

Then&calculate&probability&of&outcomes:&

P(Y&=&0|X&=&(0,&0,&0))&=&(0.421875&*&0.8)/(0.421875&*&0.8&+&0.015625&*&0.2)&=&108/109&

Discover Assignments of Machine Learning Texas A&M University (A&M)

Partial preview of the text

Download assignment 1 of machine learning and more Assignments Machine Learning in PDF only on Docsity!

Assignment 1

Let x = (x 1

, x 2

, x 3

) describe a patient’s medical record, where each x i

∈ {0, 1}. (For example,

each x i

could record the binary outcome of a certain lab test). The label y indicates the

presence or absence of a certain disease, with y ∈ {0, 1}. Suppose the prior distribution

of Y is P(Y = 0) = 0.8, and P(Y = 1) = 0.2, and the likelihood is 𝑃

!

"

$

!

!%$

!

&'!

. If h is the predicted outcome while the truth

is y, the loss l(h, y) incurred is given by l(0, 0) = 0, l(0, 1) = +1000, l(1, 0) = +100, and

l(1, 1) = −500.

(i) Determine the Bayes Optimal hypothesis h Bayes−Optimal

(ii) What is the Bayes Optimal Risk?

Prior distribution: P(Y = 0) = 0.8, P(Y = 1) = 0.2.

Likelihood: 𝑃

!

"

$

!

!%$

!

&'!

Loss l(h, y): l(0, 0) = 0, l(0, 1) = +1000, l(1, 0) = +100, l(1, 1) = −500.

Determine the Bayes Optimal hypothesis.

From the likelihood we know:

P(X = (0, 0, 0)|Y = 0) = 0.

0

1

0

1

0

1

P(X = (1, 0, 0)|Y = 0 ) = 0.

1

0

1

0

1

P(X = (0, 1, 0)|Y = 0) = 0.

0

1

0

1

P(X = (0, 0, 1)|Y = 0) = 0.

0

1

0

1

0

P(X = (0, 1, 1)|Y = 0) = 0.

0

1

0

1

0

P(X = (1, 0, 1)|Y = 0) = 0.

1

0

1

0

P(X = (1, 1, 0)|Y = 0) = 0.

1

0

1

0

1

P(X = (1, 1 , 1 )|Y = 0) = 0.

1

0

1

0

1

0

P(X = (0, 0, 0)|Y = 1 ) = 0. 75

0

1

0

1

0

1

P(X = (1, 0, 0)|Y = 1 ) = 0. 75

1

0

1

0

1

P(X = (0, 1, 0)|Y = 1 ) = 0. 75

0

1

0

1

P(X = (0, 0, 1)|Y = 1 ) = 0. 75

0

1

0

1

0

P(X = (0, 1, 1)|Y = 1 ) = 0. 75

0

1

0

1

0

P(X = (1, 0, 1)|Y = 1 ) = 0. 75

1

0

1

0

P(X = (1, 1, 0)|Y = 1 ) = 0. 75

1

0

1

0

1

P(X = (1, 1 , 1 )|Y = 1 ) = 0. 75

1

0

1

0

1

0

Then calculate probability of outcomes:

P(Y = 0|X = (0, 0, 0)) = (0.421875 * 0.8)/(0.421875 * 0.8 + 0.015625 * 0.2) = 108/

The Bayes Optimal hypothesis can be written as

(

the predict should be chosen to minimize the loss.

P(Y = 0|X = (1, 0, 0)) = (0. 140625 * 0.8)/(0. 140625 * 0.8 + 0.046875 * 0.2) = 12/
P(Y = 0|X = (0, 1, 0)) = (0. 140625 * 0.8)/(0. 140625 * 0.8 + 0.046875 * 0.2) = 12/
P(Y = 0|X = (0, 0, 1)) = (0. 140625 * 0.8)/(0. 140625 * 0.8 + 0.046875 * 0.2) = 12/
P(Y = 0|X = (0, 1, 1)) = (0.046875 * 0.8)/(0.046875 * 0.8 + 0.140625 * 0.2) = 4/
P(Y = 0|X = (1, 0, 1)) = (0.046875 * 0.8)/(0.046875 * 0.8 + 0.140625 * 0.2) = 4/
P(Y = 0|X = (1, 1, 0)) = (0.046875 * 0.8)/(0.046875 * 0.8 + 0.140625 * 0.2) = 4/
P(Y = 0|X = (1, 1 , 1 )) = (0.015625 * 0.8)/(0.015625 * 0.8 + 0.421875 * 0.2) = 4/
P(Y = 1 |X = (0, 0, 0)) = (0.0 15625 * 0. 2 )/(0.0 15625 * 0. 2 + 0.421875 * 0. 8 ) = 1/
P(Y = 1 |X = (1, 0, 0)) = (0.046875 * 0.2)/(0.046875 * 0.2 + 0. 140625 * 0.8) = 1/
P(Y = 1 |X = (0, 1, 0)) = (0.046875 * 0.2)/(0.046875 * 0.2 + 0. 140625 * 0.8) = 1/
P(Y = 1 |X = (0, 0, 1)) = (0.046875 * 0.2)/(0.046875 * 0.2 + 0. 140625 * 0.8) = 1/
P(Y = 1 |X = (0, 1, 1)) = (0.140625 * 0.2)/(0.140625 * 0.2 + 0.046875 * 0.8) = 3/
P(Y = 1 |X = (1, 0, 1)) = (0.140625 * 0.2)/(0.140625 * 0.2 + 0.046875 * 0.8) = 3/
P(Y = 1 |X = (1, 1, 0)) = (0.140625 * 0.2)/(0.140625 * 0.2 + 0.046875 * 0.8) = 3/
P(Y = 1 |X = (1, 1 , 1 )) = (0.421875 * 0.2)/(0.421875 * 0.2 + 0.015625 * 0.8) = 27/
Risk(Y = 0|X = (0, 0, 0)) = (l(0, 0) + l(0, 1)) * 108/109 = 108000/ 2) The Bayes Optimal Risk is given by 𝐸[l(ℎ, 𝑦)] = 𝑃(ℎ ≠ 𝑦).
Risk(Y = 0|X = (1, 0, 0)) = (l(0, 0) + l(0, 1)) * 12/13 = 12000/
Risk(Y = 0|X = (0, 1, 0)) = (l(0, 0) + l(0, 1)) * 12/13 = 12000/
Risk(Y = 0|X = (0, 0, 1)) = (l(0, 0) + l(0, 1)) * 12/13 = 12000/
Risk(Y = 0|X = (0, 1, 1)) = (l(0, 0) + l(0, 1)) * 4/7 = 4000 /
Risk(Y = 0|X = (1, 0, 1)) = (l(0, 0) + l(0, 1)) * 4/7 = 4000 /
Risk(Y = 0|X = (1, 1, 0)) = (l(0, 0) + l(0, 1)) * 4/7 = 4000 /
Risk(Y = 0|X = (1, 1 , 1 )) = (l(0, 0) + l(0, 1)) * 4/31 = 4000 /
Risk(Y = 1 |X = (0, 0, 0)) = (l( 1 , 0) + l( 1 , 1)) * 1/109 = 1000/
Risk(Y = 1 |X = (1, 0, 0)) = (l( 1 , 0) + l( 1 , 1)) * 1/13 = 1000/
Risk(Y = 1 |X = (0, 1, 0)) = (l( 1 , 0) + l( 1 , 1)) * 1/13 = 1000/
Risk(Y = 1 |X = (0, 0, 1)) = (l( 1 , 0) + l( 1 , 1)) * 1/13 = 1000/
Risk(Y = 1 |X = (0, 1, 1)) = (l( 1 , 0) + l( 1 , 1)) * 3/7 = 3000/

Therefore, the estimator will converge in probability.

Consult any reference or book or notes on the Internet, and

(i) Clearly state and provide a proof of the Cramer Rao lower bound for an unbiased

estimate. It should be written as a formal Theorem, with a statement of the conditions

assumed and the result, followed by a formal Proof. You can consult any source, but the

statement and proof must be in your own words.

(ii) Consider the regression problem considered in class, where X = 2y + 20 + w, with w

∼ N(0, 5

2

). Does the Maximum Likelihood estimate of y achieve the Cramer Rao lower

bound?

The Cramer-Rao Lower Bound (CRLB) gives a lower estimate for the variance of any

unbiased estimator. Estimators that are closer to the CRLB are more unbiased.

That is, if θ

is an unbiased estimator of θ, then

<

=

"

<

=

There are different ways of calculating the CRLB. The most common form uses Fisher

Information.

Let X 1

, X

2

, … , X

n

be a random sample with PDF f(x, θ). If θ

is an unbiased estimator for θ,

then

!

>?(<)

where 𝐼

@

@<

ln 𝑓

"

@

"

@<

"

ln 𝑓

is the Fisher Information.

Theorem:

It is assumed that the PDF f(x; θ) satisfies the “regularity” condition

@ .A B($;<)

@<

= 0 for all θ

where the expectation is taken with respect to f(x; θ).

Then, the variance of any unbiased estimator θ

must satisfy

!

9 D

)

)*

.A B($;<)E

"

!

% 9 F

)

"

)*

"

.A B($;<)G

!

?(<)

where the derivative is evaluated at the true value of θ and the expectation is taken with

respect to f(x; θ).

Furthermore, an unbiased estimator may be found that attains the bound for all θ if and

only if

@ .A B($;<)

@<

for some functions g and I. That estimator, which is MVU (Minimum Variance Unbiased)

estimator, is θ

= g(x) and the minimum variance is 1/I(θ).

Proof:

Consider a scalar parameter α = g(θ) where the PDF is parameterized by θ. Assume the

estimators are unbiased, i.e.,

E(α) = α = g(θ) or ∫

Usually, the regularity condition will be satisfied if the order of differentiation and

integration may be interchanged.

Now differentiating both sides of the last equation above with respect to θ and

interchanging the partial differentiation and integration produces

@B($;<)

@<

@H(<)

@<

, or

@ .A B($;<)

@<

@H(<)

@<

We can modify this using the regularity condition to produce

@ .A B

( $;<

)

@<

@H(<)

@<

since ∫ 𝛼

@ .A B

( $;<

)

@<

@ .A B($;<)

@<

Now consider the Cauchy-Schwarz inequality

[

]

"

which holds with equality if and only if g(x) = c h(x) for c some constant not dependent

on x. The functions g and h are arbitrary scalar functions, while w(x) ≥ 0 for all x.

Now let

@ .A B

( $;<

)

@<

and apply the Cauchy-Schwarz inequality to the equation above to produce

@H(<)

@<

"

@ .A B($;<)

@<

"

or 𝑣𝑎𝑟(α) ≥

I

)+(*)

)*

J

"

9 D

) ./ 0 ( 1 ;*)

)*

E

"

I

)+(*)

)*

J

"

% 9 F

)

"

./ 0 ( 1 ;*)

)*

"

G

If α = g(θ) = θ, we have

𝑣𝑎𝑟(α) ≥

!

9 D

) ./ 0

( 1 ;*

)

)*

E

"

!

% 9 F

)

"

./ 0 ( 1 ;*)

)*

"

G

!

?(<)

Note that the condition for equality is

@ .A B($;<)

@<

!

K

where c can depend on θ but not on x. When α = g(θ) = θ, it is

assignment 1 of machine learning, Assignments of Machine Learning

Related documents

Partial preview of the text

Download assignment 1 of machine learning and more Assignments Machine Learning in PDF only on Docsity!

Assignment 1

P(X = (0, 0, 0)|Y = 0) = 0.

P(X = (1, 0, 0)|Y = 0 ) = 0.

P(X = (0, 1, 0)|Y = 0) = 0.

P(X = (0, 0, 1)|Y = 0) = 0.

P(X = (0, 1, 1)|Y = 0) = 0.

P(X = (1, 0, 1)|Y = 0) = 0.

P(X = (1, 1, 0)|Y = 0) = 0.

P(X = (1, 1 , 1 )|Y = 0) = 0.

P(X = (0, 0, 0)|Y = 1 ) = 0. 75

P(X = (1, 0, 0)|Y = 1 ) = 0. 75

P(X = (0, 1, 0)|Y = 1 ) = 0. 75

P(X = (0, 0, 1)|Y = 1 ) = 0. 75

P(X = (0, 1, 1)|Y = 1 ) = 0. 75

P(X = (1, 0, 1)|Y = 1 ) = 0. 75

P(X = (1, 1, 0)|Y = 1 ) = 0. 75

P(X = (1, 1 , 1 )|Y = 1 ) = 0. 75

P(Y = 0|X = (0, 0, 0)) = (0.421875 * 0.8)/(0.421875 * 0.8 + 0.015625 * 0.2) = 108/

The Bayes Optimal hypothesis can be written as

∼ N(0, 5

, X

, … , X

[

]