



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
assignment 1 of csce machine learning
Typology: Assignments
1 / 6
This page cannot be seen from the preview
Don't miss anything!




, x 2
, x 3
) describe a patient’s medical record, where each x i
∈ {0, 1}. (For example,
each x i
could record the binary outcome of a certain lab test). The label y indicates the
presence or absence of a certain disease, with y ∈ {0, 1}. Suppose the prior distribution
of Y is P(Y = 0) = 0.8, and P(Y = 1) = 0.2, and the likelihood is 𝑃
!
"
$
!
!%$
!
&'!
. If h is the predicted outcome while the truth
is y, the loss l(h, y) incurred is given by l(0, 0) = 0, l(0, 1) = +1000, l(1, 0) = +100, and
l(1, 1) = −500.
(i) Determine the Bayes Optimal hypothesis h Bayes−Optimal
(ii) What is the Bayes Optimal Risk?
Prior distribution: P(Y = 0) = 0.8, P(Y = 1) = 0.2.
Likelihood: 𝑃
!
"
$
!
!%$
!
&'!
Loss l(h, y): l(0, 0) = 0, l(0, 1) = +1000, l(1, 0) = +100, l(1, 1) = −500.
From the likelihood we know:
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
0
1
0
Then calculate probability of outcomes:
(
the predict should be chosen to minimize the loss.
Therefore, the estimator will converge in probability.
(i) Clearly state and provide a proof of the Cramer Rao lower bound for an unbiased
estimate. It should be written as a formal Theorem, with a statement of the conditions
assumed and the result, followed by a formal Proof. You can consult any source, but the
statement and proof must be in your own words.
(ii) Consider the regression problem considered in class, where X = 2y + 20 + w, with w
2
). Does the Maximum Likelihood estimate of y achieve the Cramer Rao lower
bound?
unbiased estimator. Estimators that are closer to the CRLB are more unbiased.
That is, if θ
is an unbiased estimator of θ, then
<
=
"
<
=
There are different ways of calculating the CRLB. The most common form uses Fisher
Information.
Let X 1
2
n
be a random sample with PDF f(x, θ). If θ
is an unbiased estimator for θ,
then
!
>?(<)
where 𝐼
@
@<
ln 𝑓
"
@
"
@<
"
ln 𝑓
is the Fisher Information.
Theorem:
It is assumed that the PDF f(x; θ) satisfies the “regularity” condition
@ .A B($;<)
@<
= 0 for all θ
where the expectation is taken with respect to f(x; θ).
Then, the variance of any unbiased estimator θ
must satisfy
!
9 D
)
)*
.A B($;<)E
"
!
% 9 F
)
"
)*
"
.A B($;<)G
!
?(<)
where the derivative is evaluated at the true value of θ and the expectation is taken with
respect to f(x; θ).
Furthermore, an unbiased estimator may be found that attains the bound for all θ if and
only if
@ .A B($;<)
@<
for some functions g and I. That estimator, which is MVU (Minimum Variance Unbiased)
estimator, is θ
= g(x) and the minimum variance is 1/I(θ).
Proof:
Consider a scalar parameter α = g(θ) where the PDF is parameterized by θ. Assume the
estimators are unbiased, i.e.,
E(α) = α = g(θ) or ∫
Usually, the regularity condition will be satisfied if the order of differentiation and
integration may be interchanged.
Now differentiating both sides of the last equation above with respect to θ and
interchanging the partial differentiation and integration produces
@B($;<)
@<
@H(<)
@<
, or
@ .A B($;<)
@<
@H(<)
@<
We can modify this using the regularity condition to produce
@ .A B
( $;<
)
@<
@H(<)
@<
since ∫ 𝛼
@ .A B
( $;<
)
@<
@ .A B($;<)
@<
Now consider the Cauchy-Schwarz inequality
"
"
"
which holds with equality if and only if g(x) = c h(x) for c some constant not dependent
on x. The functions g and h are arbitrary scalar functions, while w(x) ≥ 0 for all x.
Now let
@ .A B
( $;<
)
@<
and apply the Cauchy-Schwarz inequality to the equation above to produce
@H(<)
@<
"
"
@ .A B($;<)
@<
"
or 𝑣𝑎𝑟(α) ≥
I
)+(*)
)*
J
"
9 D
) ./ 0 ( 1 ;*)
)*
E
"
I
)+(*)
)*
J
"
% 9 F
)
"
./ 0 ( 1 ;*)
)*
"
G
If α = g(θ) = θ, we have
𝑣𝑎𝑟(α) ≥
!
9 D
) ./ 0
( 1 ;*
)
)*
E
"
!
% 9 F
)
"
./ 0 ( 1 ;*)
)*
"
G
!
?(<)
Note that the condition for equality is
@ .A B($;<)
@<
!
K
where c can depend on θ but not on x. When α = g(θ) = θ, it is