MLE: Sample Mean as MLE for Bernoulli Distribution, Study notes of Sociology

The concept of maximum likelihood estimation (mle) and how it relates to finding the sample mean in the case of a bernoulli distribution. An example of finding the mle for a sample of 10 observations using stata, and discusses the usefulness of the logarithm of the likelihood and the deviance in mle.

Typology: Study notes

2011/2012

Uploaded on 11/20/2012

shubnam
shubnam 🇮🇳

4.5

(6)

127 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Sociology
maximum likelihood estimation
A least-squares estimator of a coefficient is a function of sample data that
minimizes the sum of squared deviations of observed from fitted ; a maximum
likelihood estimator of a parameter is that function of sample data that
maximizes the likelihood (i.e., probability) of observing the sample actual
selected.
In the case of a linear regression model with quantitative Y and normally
distributed errors, the least-squares estimator and the mle of are actually
the same function of the data.
least-squares
Suppose Y is a bernoulli random variable with population mean . For a sample
of n observations on Y, we can write each observation as a function of
and a disturbance:
.
The least-squares criterion dictates that we choose as our an estimator of
the function of sample data that minimizes the sum of squared residuals:
Now suppose we sample n=10 observations on Y, which results in Y=1 six (6) times
and Y=0 four (4) times. Then the sum of squared residuals expressed as a
function of can be written as
Instead of using the well-known formula for the least-squares estimator in this
case, let’s search for the sample value of that minimizes the sum of
squared residuals. Once we find that value, we can identify by induction the
formula that would have yielded it; this formula will be the least-squares
estimator. I will search for over the values from 0 to 1, incrementing by
.1. Here are the values along with the sum of squared residuals associated
with each:
pi_hat ssresid
1. 0 6
2. .1 4.9
3. .2 4
4. .3 3.3
5. .4 2.8
6. .5 2.5
7. .6 2.4
8. .7 2.5
9. .8 2.8
10. .9 3.3
11. 1 4
docsity.com
pf3
pf4
pf5

Partial preview of the text

Download MLE: Sample Mean as MLE for Bernoulli Distribution and more Study notes Sociology in PDF only on Docsity!

Sociology maximum likelihood estimation

A least-squares estimator of a coefficient is a function of sample data that

minimizes the sum of squared deviations of observed from fitted ; a maximum

likelihood estimator of a parameter is that function of sample data that

maximizes the likelihood (i.e., probability) of observing the sample actual selected.

In the case of a linear regression model with quantitative Y and normally

distributed errors, the least-squares estimator and the mle of are actually

the same function of the data.

least-squares

Suppose Y is a bernoulli random variable with population mean. For a sample

of n observations on Y, we can write each observation as a function of and a disturbance:

The least-squares criterion dictates that we choose as our an estimator of the function of sample data that minimizes the sum of squared residuals:

Now suppose we sample n=10 observations on Y, which results in Y=1 six (6) times

and Y=0 four (4) times. Then the sum of squared residuals expressed as a

function of can be written as

Instead of using the well-known formula for the least-squares estimator in this

case, let’s search for the sample value of that minimizes the sum of squared residuals. Once we find that value, we can identify by induction the formula that would have yielded it; this formula will be the least-squares

estimator. I will search for over the values from 0 to 1, incrementing by

.1. Here are the values along with the sum of squared residuals associated with each:

pi_hat ssresid

  1. 0 6
  2. .1 4.
  3. .2 4
  4. .3 3.
  5. .4 2.
  6. .5 2.
  7. .6 2.
  8. .7 2.
  9. .8 2.
  10. .9 3.
  11. 1 4

Here’s the graph of the ssresid against the values of

The value of we’re looking for is .6, which is the sample mean of Y, the sample proportion of 1's. So the sample mean., i.e., proportion is the l-s estimator.

maximum likelihood

In order to develop a mle, one needs to specify the probability

distribution of the dependent variable , be it normal, bernoulli, poisson, etc. For logit models, the dependent variable is distributed as a bernoulli, so this is the case we will treat. To really simplify matters so that the essence of mle becomes clear, we will assume that our model has no x variables on the right-hand-side of the equation, which is tantamount to assuming

i.e., the probability that Y is 1 is constant, so we are just estimating the population proportion of 1's, , the parameter of a bernoulli distribution.

To simplify matters still further (and avoid calculus), we will proceed by induction. We'll find the maximum likelihood estimate of , and then figure out what function of sample data it is, i.e., what's the formula that would have yielded that estimate. The formula will be the mle we’re looking for.

Suppose we have a sample of n=10 observations from a bernoulli population with parameter. Suppose n_1=6 is the number of observations for which Y=1, and n_0=4 is the number for which Y=0. To find the maximum likelihood

estimate of , call it we will compute the likelihood of the

sample for values of between 0 and 1, incrementing by .1.

Recall that the probability of observing any arbitrary observation from a bernoulli distribution can be written as:

Now let’s plot the likelihood against the pi_hats so we can find the maximum graphically.

4. graph likelihd pi_hat,ylabel xlabel(.2, .4, .6, .8) c(l)

Notice that the likelihood is maximized when

which is just the sample proportion of ones , that is n_1/n , also known as the sample mean.

useful functions of the likelihood

Because it will be useful later on, let’s take a look at the logarithm of the likelihood.

**5. gen ln_like = ln(likeli)

  1. list p_hat likeli lnlike**

p_hat likelihd ln_like

  1. 0 0.
  2. .1 6.56e-07 -14.
  3. .2 .0000262 -10.
  4. .3 .000175 -8.
  5. .4 .0005308 -7.
  6. .5 .0009766 -6.
  7. .6 .0011944 -6.
  8. .7 .000953 -6.
  9. .8 .0004194 -7.
  10. .9 .0000531 -9.
  11. 1 0.

Notice that pi_hat=.6 also maximizes the log of the likelihood.

Later we will have occasion to compute a quantity called the “deviance.” The deviance is defined as

8. gen D = -2lnlike* (2 missing values generated) 9. list p_hat likeli lnlike dev

pi_hat likeli lnlike D

  1. 0 0..
  2. .1 6.56e-07 -14.23695 28.
  3. .2 .0000262 -10.5492 21.
  4. .3 .000175 -8.650537 17.
  5. .4 .0005308 -7.541047 15.
  6. .5 .0009766 -6.931472 13.
  7. .6 .0011944 -6.730117 13.
  8. .7 .000953 -6.955941 13.
  9. .8 .0004194 -7.776613 15.
  10. .9 .0000531 -9.842503 19.
  11. 1 0..

so let’s look at the graph of the deviance.

10.graph D pi_hat,ylabel xlabel(.2, .4, .6, .8) c(l)