Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Maximum Likelihood, Summaries of Probability and Statistics

Massachusetts Institute of Technology (MIT)Probability and Statistics

We are going to introduce a new way of choosing parameters called Maximum Likelihood Estimation (MLE). We want to select that parameters (θ) ...

Typology: Summaries

2022/2023

Uploaded on 05/11/2023

ekasha 🇺🇸

4.8

(22)

270 documents

1 / 2

This page cannot be seen from the preview

Don't miss anything!

Maximum Likelihood

Chris Piech

CS109

Handout #35

May 13th, 2016

Consider IID random samples X1,X2,. . . Xnwhere Xiis a sample from the density function f(Xi|θ). We are

going to introduce a new way of choosing parameters called Maximum Likelihood Estimation (MLE). We

want to select that parameters (θ) that make the observed data the most likely. Note that we are now using

notation that shows that the density of X depends on its parameters, θ.

First we define the likelihood of our data give parameters θ:

L(θ) =

n

∏

i=1

f(Xi|θ)

This is the probability of all of our data. It evaluates to a product because all Xiare independent. Now we

chose the value of θthat maximizes the likelihood function. Formally ˆ

θ=argmax

θ

L(θ).

A cool property of argmax is that since log is a monotone function, the argmax of a function is the same

as the argmax of the log of the function! That’s nice because logs make the math simpler. Instead of using

likelihood, you should instead use log likelihood: LL(θ).

LL(θ) = log

n

∏

i=1

f(Xi|θ) =

n

∑

i=1

log f(Xi|θ)

To use a maximum likelihood estimator, first write the log likelihood of the data given your parameters. Then

chose the value of parameters that maximize the log likelihood function. Argmax can be computed in many

ways. Most require computing the first derivative of the function.

Bernoulli MLE Estimation

Consider IID random variables X1,X2,. . . Xnwhere Xi∼Ber(p). First we are going to write the PMF of a

Bernoulli in a crazy way: The probability mass function f(Xi|p) = pXi(1−p)1−Xi. Wow! Whats up with

that? First convince yourself that when Xi=0 and Xi=1 this returns the right probabilities. We write the

PMF this way because its derivable.

Now let’s do some MLE estimation:

L(θ) =

n

∏

i=1

pXi(1−p)1−Xi

LL(θ) =

n

∑

i=1

log pXi(1−p)1−Xi

=

n

∑

i=1

Xi(log p)+(1−Xi)log(1−p)

=Ylog p+ (n−Y)log(1−p)where Y=

n

∑

i=1

Xi

Great Scott! Now we simply need to chose the value of pthat maximizes our log-likelihood. One way to do

that is to find the first derivative and set it equal to 0.

δLL(p)

δp=Y1

p+ (n−Y)−1

1−p=0

ˆp=Y

n=∑n

i=1Xi

n

All that work and we get the same thing as method of moments and sample mean...

Discover Summaries of Probability and Statistics Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download Maximum Likelihood and more Summaries Probability and Statistics in PDF only on Docsity!

Maximum Likelihood

Chris Piech

CS

Handout # May 13th, 2016

Consider IID random samples X 1 , X 2 ,... Xn where Xi is a sample from the density function f (Xi|θ ). We are going to introduce a new way of choosing parameters called Maximum Likelihood Estimation (MLE). We want to select that parameters (θ ) that make the observed data the most likely. Note that we are now using notation that shows that the density of X depends on its parameters, θ.

First we define the likelihood of our data give parameters θ :

L(θ ) =

n

i= 1

f (Xi|θ )

This is the probability of all of our data. It evaluates to a product because all Xi are independent. Now we chose the value of θ that maximizes the likelihood function. Formally ˆθ = argmax θ

L(θ ).

A cool property of argmax is that since log is a monotone function, the argmax of a function is the same as the argmax of the log of the function! That’s nice because logs make the math simpler. Instead of using likelihood, you should instead use log likelihood: LL(θ ).

LL(θ ) = log

n

i= 1

f (Xi|θ ) =

n

i= 1

log f (Xi|θ )

To use a maximum likelihood estimator, first write the log likelihood of the data given your parameters. Then chose the value of parameters that maximize the log likelihood function. Argmax can be computed in many ways. Most require computing the first derivative of the function.

Bernoulli MLE Estimation

Consider IID random variables X 1 , X 2 ,... Xn where Xi ∼ Ber(p). First we are going to write the PMF of a Bernoulli in a crazy way: The probability mass function f (Xi|p) = pXi^ ( 1 − p)^1 −Xi^. Wow! Whats up with that? First convince yourself that when Xi = 0 and Xi = 1 this returns the right probabilities. We write the PMF this way because its derivable.

Now let’s do some MLE estimation:

L(θ ) =

n

i= 1

pXi^ ( 1 − p)^1 −Xi

LL(θ ) =

n

i= 1

log pXi^ ( 1 − p)^1 −Xi

n

i= 1

Xi(log p) + ( 1 − Xi)log( 1 − p)

= Y log p + (n −Y )log( 1 − p) where Y =

n

i= 1

Xi

Great Scott! Now we simply need to chose the value of p that maximizes our log-likelihood. One way to do that is to find the first derivative and set it equal to 0.

δ LL(p) δ p

= Y

p

(n −Y )

1 − p

pˆ =

Y

n

∑ni= 1 Xi n

All that work and we get the same thing as method of moments and sample mean...

Normal MLE Estimation

Consider IID random variables X 1 , X 2 ,... Xn where Xi ∼ N(μ, σ 2 ).

L(θ ) =

n ∏ i= 1

f (Xi|μ, σ 2 )

n ∏ i= 1

2 πσ

e − (Xi−μ) 2 2 σ 2

LL(θ ) =

n ∑ i= 1

log

2 πσ

e − (Xi−μ) 2 2 σ 2

n ∑ i= 1

[

− log(

2 πσ ) −

2 σ 2

(Xi − μ)^2

]

If we chose the values of ˆμ and σˆ 2 that maximize likelihood, we get: ˆμ = (^1) n ∑ni= 1 Xi and σˆ 2 = (^1) n ∑ni= 1 (Xi − μˆ)^2.

Linear Transform Plus Noise

Assume that Y = θ X + Z where Z ∼ N( 0 , σ 2 ) and X is an unknown distribution. The equations imply that Y |X ∼ N(θ X, σ 2 ). Chose a value of θ that maximizes the probability of the data: (X 1 ,Y 1 ), (X 2 ,Y 2 ),... (Xn,Yn).

We approach this problem by finding a function for the log likelihood of the data given θ. Then we find the value of θ that maximizes the log likelihood function. To start, use the PDF of a Normal to express the probability of Y |X, θ :

f (Yi|Xi, θ ) =

2 πσ

e − (Yi−θ^ Xi) 2 2 σ 2

Now we are ready to write the likelihood function, then take its log to get the log likelihood function:

L(θ ) =

n ∏ i= 1

f (Yi, Xi|θ ) Let’s break up this joint

n ∏ i= 1

f (Yi|Xi, θ ) f (Xi) f (Xi) is independent of θ

n ∏ i= 1

2 πσ

e − (Yi−θ^ Xi) 2 2 σ (^2) f (Xi) Substitute in the definition of f (Yi|Xi)

LL(θ ) = log L(θ )

= log

n ∏ i= 1

2 πσ

e−^

(Yi−θ Xi)^2 2 σ (^2) f (Xi) Substitute in L(θ )

n ∑ i= 1

log

2 πσ

e−^

(Yi−θ Xi)^2 2 σ (^2) +

n ∑ i= 1

log f (Xi) Log of a product is the sum of logs

= n log

2 π

2 σ 2

n ∑ i= 1

(Yi − θ Xi)^2 +

n ∑ i= 1

log f (Xi)

Remove constant multipliers and terms that don’t include θ. We are left with trying to find a value of θ that maximizes:

θˆ = argmax θ

m ∑ i= 1

(Yi − θ Xi)^2

= argmin θ

m ∑ i= 1

(Yi − θ Xi)^2

This result says that the value of θ that makes the data most likely is one that minimizes the squared error of predictions of Y. We will see in a few days that this is the basis for linear regression.

Maximum Likelihood, Summaries of Probability and Statistics

Related documents

Partial preview of the text

Download Maximum Likelihood and more Summaries Probability and Statistics in PDF only on Docsity!

Maximum Likelihood

Bernoulli MLE Estimation

= Y

Y

Normal MLE Estimation

[

]

Linear Transform Plus Noise