Estimation, MLE, MAP, LS - Lecture Notes | EECS 501, Study notes of Electrical and Electronics Engineering

Material Type: Notes; Class: Prb&Rand Proc; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Fall 2001;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-hvb
koofers-user-hvb 🇺🇸

7 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 501 ESTIMATION: MLE, MAP, LS Fall 2001
Model: Aknown model of system or process with unknown parameter a.
Data: An observation Rof a random variable rwhose pdf depends on a.
Modelfr|a(R|A): If knew a=A, would know pdf of observation r.
Goal: Estimate afrom Rand conditional pdf fr|a(R|A): Compute ˆa(R).
Example: Flip coin 10 times. Data: #heads in 10 independent flips.
Model: Binomial pmf for r.Unknown parameter: a=Pr[heads].
1. Non-Bayesian: ais an unknown constant (do not know fa(A)).
Given: fr|a(R|A) from model; observation (data) Rof rv r; nothing more.
Advantage: Need very little; no (possibly wrong) prior information.
Soln: Maximum Likelihood Estimator: max likelihood of what happened:r=R.
MLE: ˆaM LE (R) = argmax
A[fr|a(R|A)]. Compute:
∂A [log fr|a(R|A)] = 0.
BLUE: Best (minimum variance) Linear Unbiased Estimator of constant x
from y=Hx +v, E[v] = 0 is ˆx(Y) = (H0H)1H0Y.Proof: p. 290.
2. Bayesian: ais itself random with known a priori pdf fa(A).
Given: fr|a(R|A) from model; fa(A)=a priori info; observation Rof r.
Advantage: Incorporate a priori in estimate, but this better be right!
Soln: min E[c(e)] where e=aˆa(r)=error and c(·)=cost=MEP or LSE:
2a. MEP: Min Error Prob: c(e) = ½0 if |e|< ²;
1 if |e|> ².
“close only counts in horseshoes”
“a miss is as good as a mile”
E[c(e)] = 1 R
−∞ dR Rˆa(R)+²
ˆa(R)²dA fr,a(R, A) = 1 2²R
−∞ fr,a(R, ˆa(R))dR.
This is minimized when fr,a(R, ˆa(R)) maximized for each R.
MAP: Max A Posteriori: ˆaM AP (R) = ar gmax
A[fr|a(R|A)fa(A)] (compare MLE).
Compute:
∂A [log fr|a(R|A) + log fa(A)] = 0. MEP criterionMAP solution.
2b. LSE: Least Squares Estimation criterion: c(e) = e2. Penalize big errors.
LSE: ˆaLS (R) = E[a|r=R] = RAfr|a(R|A)fa(A)dA
Rfr|a(R|A0)fa(A0)dA0
Denominator just fr(R) :
no effect on argmax of A
Proof: Page 298. Moment of inertia minimized around center of mass.
Bias: Let abe an unknown constant Aact so that fa(A) = δ(AAact).
DEF: ˆa(R) is unbiased if Ea(r)] = Aact E[e] = 0. How to compute:
Ea(r)] = R R ˆa(R)fr|a(R|A)δ(AAact )dR dA =Rˆa(R)fr|a(R|Aact)dR.
MSE: ˆa(R) unbiasedE[(ˆa(r)Aact)2] = σ2
ˆa(r)MSE =variance of ˆa(R).
pf2

Partial preview of the text

Download Estimation, MLE, MAP, LS - Lecture Notes | EECS 501 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 501 ESTIMATION: MLE, MAP, LS Fall 2001

Model: A known model of system or process with unknown parameter a. Data: An observation R of a random variable r whose pdf depends on a. Model→ fr|a(R|A): If knew a = A, would know pdf of observation r. Goal: Estimate a from R and conditional pdf fr|a(R|A): Compute ˆa(R). Example: Flip coin 10 times. Data: #heads in 10 independent flips. Model: Binomial pmf for r. Unknown parameter: a=Pr[heads].

  1. Non-Bayesian: a is an unknown constant (do not know fa(A)). Given: fr|a(R|A) from model; observation (data) R of rv r; nothing more. Advantage: Need very little; no (possibly wrong) prior information. Soln: Maximum Likelihood Estimator: max likelihood of what happened:r=R. MLE: ˆaM LE (R) = argmaxA [fr|a(R|A)]. Compute: (^) ∂A∂ [log fr|a(R|A)] = 0.

BLUE: Best (minimum variance) Linear Unbiased Estimator of constant x from y = Hx + v, E[v] = 0 is ˆx(Y ) = (H′H)−^1 H′Y. Proof: p. 290.

  1. Bayesian: a is itself random with known a priori pdf fa(A). Given: fr|a(R|A) from model; fa(A)=a priori info; observation R of r. Advantage: Incorporate a priori in estimate, but this better be right! Soln: min E[c(e)] where e = a − ˆa(r)=error and c(·)=cost=MEP or LSE:

2a. MEP: Min Error Prob: c(e) =

0 if |e| < ≤; 1 if |e| > ≤.

“close only counts in horseshoes” “a miss is as good as a mile” E[c(e)] = 1 −

−∞ dR^

∫ (^) aˆ(R)+≤ a ˆ(R)−≤ dA fr,a(R, A) = 1^ −^2 ≤^

−∞ fr,a(R,^ ˆa(R))dR. This is minimized when fr,a(R, ˆa(R)) maximized for each R.

MAP: Max A Posteriori: ˆaM AP (R) = argmaxA [fr|a(R|A)fa(A)] (compare MLE).

Compute: (^) ∂A∂ [log fr|a(R|A) + log fa(A)] = 0. MEP criterion→MAP solution.

2b. LSE: Least Squares Estimation criterion: c(e) = e^2. Penalize big errors. LSE: ˆaLS (R) = E[a|r = R] =

∫ Afr|a(R|A)fa(A)dA fr|a(R|A′)fa(A′)dA′

Denominator just fr(R) : no effect on argmax of A Proof: Page 298. Moment of inertia minimized around center of mass.

Bias: Let a be an unknown constant Aact so that fa(A) = δ(A − Aact). DEF: ˆa(R) is unbiased if E[ˆa(r)] = Aact ↔ E[e] = 0. How to compute: E[ˆa(r)] =

ˆa(R)fr|a(R|A)δ(A − Aact)dR dA =

ˆa(R)fr|a(R|Aact)dR. MSE: ˆa(R) unbiased→ E[(ˆa(r) − Aact)^2 ] = σ^2 ˆa(r) →MSE =variance of ˆa(R).

EECS 501 ESTIMATION EXAMPLES Fall 2001 Given: Flip coin with Pr[heads]=a. Data: #heads in 10 independent flips. Model: pmf pr|a(R|A) =

R

AR(1 − A)^10 −R, R = 0, 1... 10; 0 ≤ A ≤ 1.

Goal: Estimate a=Pr[heads] from r=#heads in 10 flips and a priori fa(A).

MLE: (^) ∂A∂ [log

R

  • R log A + (10 − R) log(1 − A)] = RA − (^101) −−AR = 0 → ˆaM LE (R) = 10 R. Easy to interpret! Note: No a priori pdf for a.

Bias: E[ˆaM LE (r)] = E[ 10 r ] = 10 A 10 act = Aact → ˆaM LE (r) unbiased. MSE: E[(ˆaM LE (r) − Aact)^2 ] = σ^210 r (since unbiased) = 10 Aact 100 (1− Aact).

EX2: Now suppose have fa(A) = 1 for 0 ≤ A ≤ 1 (Bayesian problem). MAP: log fa(A) = 0 →same algebra→ ˆaM AP (R) = ˆaM LE (R) = 10 R. Have: Uniform a priori pdf a ∼ N (0, σ^2 → ∞) → ˆaM AP (R) = ˆaM LE (R).

EX3: Now suppose have fa(A) = 2A for 0 ≤ A ≤ 1 (Bayesian problem). MAP: (^) ∂A∂ [log

R

  • R log A + (10 − R) log(1 − A) + log 2 + log A] = RA − (^101) −−AR + (^) A^1 = 0 → ˆaM AP (R) = R 11 +1. A slanted estimator!

EX4: Now suppose have fa(A) = 1 for 0 ≤ A ≤ 1 (Bayesian problem).

LSE: ˆaLS (R) = E[a|r = R] =

0 A(^

(^10) R )AR (^) (1−A) 10 −R (^) dA ∫ (^1) 0 (^

(^10) R )AR(1−A) 10 −RdA =^

R+

Ref: Schaum’s Outline Math. Handbook, (15.24) on p. 95. ˆaLS (5) = 12. Note: Even with a uniform a priori distribution for a, ˆaLS still slanted!

LLSE: min E[(a − aˆ(r))^2 ] such that ˆa(R) = cR + b for some constants b, c. Soln: (^) ∂c∂ E[(a − cr − b)^2 ] = 0 → aˆLLSE (R) = E[a] + λ σar (^2) r (R − E[r]). & (^) ∂b∂ E[(a − cr − b)^2 ] = 0. This is Linear Least Squares Estimator.

LSE: r, a jointly Gaussian→

[

r a

]

∼ N

([

E[r] E[a]

]

[

σ^2 r λra λra σ^2 a

])

→ aˆLS (R) = E[a|r = R] = E[a] + λ σar (^2) r (R − E[r]) = ˆaLLSE (R)! Fact: Two very different problems have the same solution!

Norm: Normalized form: (ˆa(R) − E[a])/σa = ρar (R − E[r])/σr.

MSE: E[(a − ˆa(r))^2 ] = σ^2 a − λ

(^2) ar σ r^2 →^ E

[(

ˆa(r)−a σa

) 2 ]

= 1 − ρ^2 ar.