Download Maximum Likelihood - Computer System Modeling Fundamentals - Lecture Slides and more Slides Java Programming in PDF only on Docsity!
Today…
- More about parameter estimation
- Using maximum likelihood
- Using MAP
- Next time: graphical models
Hypothesis Testing
- The maximum likelihood (ML) hypothesis is the
hypothesis that makes the data most likely
H
ML
= argmax
i
P(D | H
i
ML Parameter Estimation
- The maximum likelihood (ML) estimate is the parameter
value that makes the data most likely
, …, X
n
are independent observations, then
= argmax θ log P ( X i = x i ( ;^ θ^ )) i = 1 n ∑
ML = argmax θ
P ( X
1 = x 1
, X
2 = x 2
,..., X
n = x n ; θ) €
ML = argmax θ
P ( X
i = x i ; θ) i = 1 n ∏
ML Parameter Estimation
- The maximum likelihood (ML) estimate is the parameter
value that makes the data most likely
, …, X
n
are independent observations, then
= argmax θ log f X (^) i ( x i
( ;^ θ^ ))
i = 1 n
ML = argmax θ f X 1 ,..., X (^) n ( x 1 , x 2 ,..., x n ; θ ) €
ML = argmax θ f X (^) i ( x i ; θ ) i = 1 n
Log Likelihood for Computation
- The log likelihood has a computational benefit too…
The Good and the Bad of ML
- Maximum likelihood is consistent – as the number of
observations gets large, the maximum likelihood estimate
gets closer and closer to the true parameter value
The Bayesian Point of View
- Instead of treating parameters as fixed but unknown values
θ , Bayesians treat them as random variables Θ
The Bayesian Point of View
- Instead of treating parameters as fixed but unknown values
θ , Bayesians treat them as random variables Θ
- Can then define the notions of prior and posterior…
The Bayesian Point of View
- Instead of treating parameters as fixed but unknown values
θ , Bayesians treat them as random variables Θ
- Can then define the notions of prior and posterior…
- Prior: € P (Θ = θ) € f Θ ( θ )
The Bayesian Point of View
- Instead of treating parameters as fixed but unknown values
θ , Bayesians treat them as random variables Θ
- Can then define the notions of prior and posterior…
- Prior:
- Posterior: € P (Θ = θ) € f Θ ( θ ) € P (Θ = θ | X 1 = x 1 ,..., X n = x n )
The Bayesian Point of View
- Instead of treating parameters as fixed but unknown values
θ , Bayesians treat them as random variables Θ
- Can then define the notions of prior and posterior…
- As before, priors may be subjective or estimated from data € P (Θ = θ) € f Θ ( θ ) € P (Θ = θ | X 1 = x 1 ,..., X n = x n ) € f Θ| X 1 ,..., X (^) n ( θ | x 1 ,..., x n )
MAP Parameter Estimation
- The maximum a posteriori (MAP) estimate is the most
likely parameter value given the data
MAP = argmax θ P (Θ = θ | X 1 = x 1
,..., X
n = x n
MAP Parameter Estimation
- The maximum a posteriori (MAP) estimate is the most
likely parameter value given the data
, …, X
n
are independent given Θ, then
MAP = argmax θ P (Θ = θ | X 1 = x 1
,..., X
n = x n
MAP = argmax θ P (Θ = θ ) P ( X i = x i | Θ = θ ) i = 1 n ∏
= argmax θ
P ( X
1 = x 1
,..., X
n = x n | Θ = θ ) P (Θ = θ )
MAP Parameter Estimation
- The maximum a posteriori (MAP) estimate is the most
likely parameter value given the data
, …, X
n
are independent given Θ, then
- Can use the same log trick here too
MAP = argmax θ P (Θ = θ | X 1 = x 1
,..., X
n = x n
MAP = argmax θ P (Θ = θ ) P ( X i = x i | Θ = θ ) i = 1 n ∏
= argmax θ
P ( X
1 = x 1
,..., X
n = x n | Θ = θ ) P (Θ = θ )