assignment 1 of machine learning, Assignments of Machine Learning

assignment 1 of csce machine learning

Typology: Assignments

2021/2022

Uploaded on 10/10/2022

huixin-zhang-1103
huixin-zhang-1103 🇺🇸

5

(1)

4 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Assignment)1)
1. Let&x&=&(x1,&x2,&x3)&describe&a&patient’s&medical&record,&where&each&xi&∈&{0,&1}.&(For&example,&
each&xi&could&record&the&binary&outcome&of&a&certain&lab&test).&The&label&y&indicates&the&
presence&or&absence&of&a&certain&disease,&with&y&&{0,&1}.&Suppose&the&prior&distribution&
of&Y&is&P(Y&=&0)&=&0.8,&and&P(Y&=&1)&=&0.2,&and&the&likelihood&is&
𝑃
(
𝑋=
(
𝑥!,𝑥",𝑥#
)
|𝑌=
𝑦
)
=
(0.5𝑦+0.25)$!(0.750.5𝑦)!%$!
#
&'!
.&If&h&is&the&predicted&outcome&while&the&truth&
is&y,&the&loss&l(h,&y)&incurred&is&given&by&l(0,&0)&=&0,&l(0,&1)&=&+1000,&l(1,&0)&=&+100,&and&
l(1,&1)&=&−500.&
(i)&Determine&the&Bayes&Optimal&hypothesis&hBayes−Optimal.&
(ii)&What&is&the&Bayes&Optimal&Risk?&
Prior&distribution:&P(Y&=&0)&=&0.8,&P(Y&=&1)&=&0.2.&
Likelihood:&
𝑃
(
𝑋=
(
𝑥!,𝑥",𝑥#
)
|𝑌=𝑦
)
=
(0.5𝑦+0.25)$!(0.750.5𝑦)!%$!
#
&'!
.&
Loss&l(h,&y):&l(0,&0)&=&0,&l(0,&1)&=&+1000,&l(1,&0)&=&+100,&l(1,&1)&=&−500.&
1) Determine&the&Bayes&Optimal&hypothesis.&
From&the&likelihood&we&know:&
P(X&=&(0,&0,&0)|Y&=&0)&=&0.2500.751&*&0.2500.751&*&0.2500.751&=&0.421875&
P(X&=&(1,&0,&0)|Y&=&0)&=&0.2510.750&*&0.2500.751&*&0.2500.751&=&0.140625&
P(X&=&(0,&1,&0)|Y&=&0)&=&0.2500.751&*&0.2510.750&*&0.2500.751&=&0.140625&
P(X&=&(0,&0,&1)|Y&=&0)&=&0.2500.751&*&0.2500.751&*&0.2510.750&=&0.140625&
P(X&=&(0,&1,&1)|Y&=&0)&=&0.2500.751&*&0.2510.750&*&0.2510.750&=&0.046875&
P(X&=&(1,&0,&1)|Y&=&0)&=&0.2510.750&*&0.2500.751&*&0.2510.750&=&0.046875&
P(X&=&(1,&1,&0)|Y&=&0)&=&0.2510.750&*&0.2510.750&*&0.2500.751&=&0.046875&
P(X&=&(1,&1,&1)|Y&=&0)&=&0.2510.750&*&0.2510.750&*&0.2510.750&=&0.015625&
P(X&=&(0,&0,&0)|Y&=&1)&=&0.7500.251&*&0.7500.251&*&0.7500.251&=&0.015625&
P(X&=&(1,&0,&0)|Y&=&1)&=&0.7510.250&*&0.7500.251&*&0.7500.251&=&0.046875&
P(X&=&(0,&1,&0)|Y&=&1)&=&0.7500.251&*&0.7510.250&*&0.7500.251&=&0.046875&
P(X&=&(0,&0,&1)|Y&=&1)&=&0.7500.251&*&0.7500.251&*&0.7510.250&=&0.046875&
P(X&=&(0,&1,&1)|Y&=&1)&=&0.7500.251&*&0.7510.250&*&0.7510.250&=&0.140625&
P(X&=&(1,&0,&1)|Y&=&1)&=&0.7510.250&*&0.7500.251&*&0.7510.250&=&0.140625&
P(X&=&(1,&1,&0)|Y&=&1)&=&0.7510.250&*&0.7510.250&*&0.7500.251&=&0.140625&
P(X&=&(1,&1,&1)|Y&=&1)&=&0.7510.250&*&0.7510.250&*&0.7510.250&=&0.421875&
Then&calculate&probability&of&outcomes:&
P(Y&=&0|X&=&(0,&0,&0))&=&(0.421875&*&0.8)/(0.421875&*&0.8&+&0.015625&*&0.2)&=&108/109&
pf3
pf4
pf5

Partial preview of the text

Download assignment 1 of machine learning and more Assignments Machine Learning in PDF only on Docsity!

Assignment 1

  1. Let x = (x 1

, x 2

, x 3

) describe a patient’s medical record, where each x i

∈ {0, 1}. (For example,

each x i

could record the binary outcome of a certain lab test). The label y indicates the

presence or absence of a certain disease, with y ∈ {0, 1}. Suppose the prior distribution

of Y is P(Y = 0) = 0.8, and P(Y = 1) = 0.2, and the likelihood is 𝑃

!

"

$

!

!%$

!

&'!

. If h is the predicted outcome while the truth

is y, the loss l(h, y) incurred is given by l(0, 0) = 0, l(0, 1) = +1000, l(1, 0) = +100, and

l(1, 1) = −500.

(i) Determine the Bayes Optimal hypothesis h Bayes−Optimal

(ii) What is the Bayes Optimal Risk?

Prior distribution: P(Y = 0) = 0.8, P(Y = 1) = 0.2.

Likelihood: 𝑃

!

"

$

!

!%$

!

&'!

Loss l(h, y): l(0, 0) = 0, l(0, 1) = +1000, l(1, 0) = +100, l(1, 1) = −500.

  1. Determine the Bayes Optimal hypothesis.

From the likelihood we know:

P(X = (0, 0, 0)|Y = 0) = 0.

0

1

0

1

0

1

P(X = (1, 0, 0)|Y = 0 ) = 0.

1

0

0

1

0

1

P(X = (0, 1, 0)|Y = 0) = 0.

0

1

1

0

0

1

P(X = (0, 0, 1)|Y = 0) = 0.

0

1

0

1

1

0

P(X = (0, 1, 1)|Y = 0) = 0.

0

1

1

0

1

0

P(X = (1, 0, 1)|Y = 0) = 0.

1

0

0

1

1

0

P(X = (1, 1, 0)|Y = 0) = 0.

1

0

1

0

0

1

P(X = (1, 1 , 1 )|Y = 0) = 0.

1

0

1

0

1

0

P(X = (0, 0, 0)|Y = 1 ) = 0. 75

0

1

0

1

0

1

P(X = (1, 0, 0)|Y = 1 ) = 0. 75

1

0

0

1

0

1

P(X = (0, 1, 0)|Y = 1 ) = 0. 75

0

1

1

0

0

1

P(X = (0, 0, 1)|Y = 1 ) = 0. 75

0

1

0

1

1

0

P(X = (0, 1, 1)|Y = 1 ) = 0. 75

0

1

1

0

1

0

P(X = (1, 0, 1)|Y = 1 ) = 0. 75

1

0

0

1

1

0

P(X = (1, 1, 0)|Y = 1 ) = 0. 75

1

0

1

0

0

1

P(X = (1, 1 , 1 )|Y = 1 ) = 0. 75

1

0

1

0

1

0

Then calculate probability of outcomes:

P(Y = 0|X = (0, 0, 0)) = (0.421875 * 0.8)/(0.421875 * 0.8 + 0.015625 * 0.2) = 108/

The Bayes Optimal hypothesis can be written as

(

the predict should be chosen to minimize the loss.

  • P(Y = 0|X = (1, 0, 0)) = (0. 140625 * 0.8)/(0. 140625 * 0.8 + 0.046875 * 0.2) = 12/
  • P(Y = 0|X = (0, 1, 0)) = (0. 140625 * 0.8)/(0. 140625 * 0.8 + 0.046875 * 0.2) = 12/
  • P(Y = 0|X = (0, 0, 1)) = (0. 140625 * 0.8)/(0. 140625 * 0.8 + 0.046875 * 0.2) = 12/
  • P(Y = 0|X = (0, 1, 1)) = (0.046875 * 0.8)/(0.046875 * 0.8 + 0.140625 * 0.2) = 4/
  • P(Y = 0|X = (1, 0, 1)) = (0.046875 * 0.8)/(0.046875 * 0.8 + 0.140625 * 0.2) = 4/
  • P(Y = 0|X = (1, 1, 0)) = (0.046875 * 0.8)/(0.046875 * 0.8 + 0.140625 * 0.2) = 4/
  • P(Y = 0|X = (1, 1 , 1 )) = (0.015625 * 0.8)/(0.015625 * 0.8 + 0.421875 * 0.2) = 4/
  • P(Y = 1 |X = (0, 0, 0)) = (0.0 15625 * 0. 2 )/(0.0 15625 * 0. 2 + 0.421875 * 0. 8 ) = 1/
  • P(Y = 1 |X = (1, 0, 0)) = (0.046875 * 0.2)/(0.046875 * 0.2 + 0. 140625 * 0.8) = 1/
  • P(Y = 1 |X = (0, 1, 0)) = (0.046875 * 0.2)/(0.046875 * 0.2 + 0. 140625 * 0.8) = 1/
  • P(Y = 1 |X = (0, 0, 1)) = (0.046875 * 0.2)/(0.046875 * 0.2 + 0. 140625 * 0.8) = 1/
  • P(Y = 1 |X = (0, 1, 1)) = (0.140625 * 0.2)/(0.140625 * 0.2 + 0.046875 * 0.8) = 3/
  • P(Y = 1 |X = (1, 0, 1)) = (0.140625 * 0.2)/(0.140625 * 0.2 + 0.046875 * 0.8) = 3/
  • P(Y = 1 |X = (1, 1, 0)) = (0.140625 * 0.2)/(0.140625 * 0.2 + 0.046875 * 0.8) = 3/
  • P(Y = 1 |X = (1, 1 , 1 )) = (0.421875 * 0.2)/(0.421875 * 0.2 + 0.015625 * 0.8) = 27/
  • Risk(Y = 0|X = (0, 0, 0)) = (l(0, 0) + l(0, 1)) * 108/109 = 108000/ 2) The Bayes Optimal Risk is given by 𝐸[l(ℎ, 𝑦)] = 𝑃(ℎ ≠ 𝑦).
  • Risk(Y = 0|X = (1, 0, 0)) = (l(0, 0) + l(0, 1)) * 12/13 = 12000/
  • Risk(Y = 0|X = (0, 1, 0)) = (l(0, 0) + l(0, 1)) * 12/13 = 12000/
  • Risk(Y = 0|X = (0, 0, 1)) = (l(0, 0) + l(0, 1)) * 12/13 = 12000/
  • Risk(Y = 0|X = (0, 1, 1)) = (l(0, 0) + l(0, 1)) * 4/7 = 4000 /
  • Risk(Y = 0|X = (1, 0, 1)) = (l(0, 0) + l(0, 1)) * 4/7 = 4000 /
  • Risk(Y = 0|X = (1, 1, 0)) = (l(0, 0) + l(0, 1)) * 4/7 = 4000 /
  • Risk(Y = 0|X = (1, 1 , 1 )) = (l(0, 0) + l(0, 1)) * 4/31 = 4000 /
  • Risk(Y = 1 |X = (0, 0, 0)) = (l( 1 , 0) + l( 1 , 1)) * 1/109 = 1000/
  • Risk(Y = 1 |X = (1, 0, 0)) = (l( 1 , 0) + l( 1 , 1)) * 1/13 = 1000/
  • Risk(Y = 1 |X = (0, 1, 0)) = (l( 1 , 0) + l( 1 , 1)) * 1/13 = 1000/
  • Risk(Y = 1 |X = (0, 0, 1)) = (l( 1 , 0) + l( 1 , 1)) * 1/13 = 1000/
  • Risk(Y = 1 |X = (0, 1, 1)) = (l( 1 , 0) + l( 1 , 1)) * 3/7 = 3000/

Therefore, the estimator will converge in probability.

  1. Consult any reference or book or notes on the Internet, and

(i) Clearly state and provide a proof of the Cramer Rao lower bound for an unbiased

estimate. It should be written as a formal Theorem, with a statement of the conditions

assumed and the result, followed by a formal Proof. You can consult any source, but the

statement and proof must be in your own words.

(ii) Consider the regression problem considered in class, where X = 2y + 20 + w, with w

∼ N(0, 5

2

). Does the Maximum Likelihood estimate of y achieve the Cramer Rao lower

bound?

  1. The Cramer-Rao Lower Bound (CRLB) gives a lower estimate for the variance of any

unbiased estimator. Estimators that are closer to the CRLB are more unbiased.

That is, if θ

is an unbiased estimator of θ, then

<

=

"

<

=

There are different ways of calculating the CRLB. The most common form uses Fisher

Information.

Let X 1

, X

2

, … , X

n

be a random sample with PDF f(x, θ). If θ

is an unbiased estimator for θ,

then

!

>?(<)

where 𝐼

@

@<

ln 𝑓

"

@

"

@<

"

ln 𝑓

š is the Fisher Information.

Theorem:

It is assumed that the PDF f(x; θ) satisfies the “regularity” condition

@ .A B($;<)

@<

š = 0 for all θ

where the expectation is taken with respect to f(x; θ).

Then, the variance of any unbiased estimator θ

must satisfy

!

9 D

)

)*

.A B($;<)E

"

!

% 9 F

)

"

)*

"

.A B($;<)G

!

?(<)

where the derivative is evaluated at the true value of θ and the expectation is taken with

respect to f(x; θ).

Furthermore, an unbiased estimator may be found that attains the bound for all θ if and

only if

@ .A B($;<)

@<

for some functions g and I. That estimator, which is MVU (Minimum Variance Unbiased)

estimator, is θ

= g(x) and the minimum variance is 1/I(θ).

Proof:

Consider a scalar parameter α = g(θ) where the PDF is parameterized by θ. Assume the

estimators are unbiased, i.e.,

E(αŸ) = α = g(θ) or ∫

Usually, the regularity condition will be satisfied if the order of differentiation and

integration may be interchanged.

Now differentiating both sides of the last equation above with respect to θ and

interchanging the partial differentiation and integration produces

@B($;<)

@<

@H(<)

@<

, or

@ .A B($;<)

@<

@H(<)

@<

We can modify this using the regularity condition to produce

@ .A B

( $;<

)

@<

@H(<)

@<

since ∫ 𝛼

@ .A B

( $;<

)

@<

@ .A B($;<)

@<

Now consider the Cauchy-Schwarz inequality

[

]

"

"

"

which holds with equality if and only if g(x) = c h(x) for c some constant not dependent

on x. The functions g and h are arbitrary scalar functions, while w(x) ≥ 0 for all x.

Now let

@ .A B

( $;<

)

@<

and apply the Cauchy-Schwarz inequality to the equation above to produce

@H(<)

@<

"

"

@ .A B($;<)

@<

"

or 𝑣𝑎𝑟(αŸ) ≥

I

)+(*)

)*

J

"

9 D

) ./ 0 ( 1 ;*)

)*

E

"

I

)+(*)

)*

J

"

% 9 F

)

"

./ 0 ( 1 ;*)

)*

"

G

If α = g(θ) = θ, we have

𝑣𝑎𝑟(αŸ) ≥

!

9 D

) ./ 0

( 1 ;*

)

)*

E

"

!

% 9 F

)

"

./ 0 ( 1 ;*)

)*

"

G

!

?(<)

Note that the condition for equality is

@ .A B($;<)

@<

!

K

where c can depend on θ but not on x. When α = g(θ) = θ, it is