



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of maximum likelihood estimation (mle) and how it relates to finding the sample mean in the case of a bernoulli distribution. An example of finding the mle for a sample of 10 observations using stata, and discusses the usefulness of the logarithm of the likelihood and the deviance in mle.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Sociology maximum likelihood estimation
A least-squares estimator of a coefficient is a function of sample data that
minimizes the sum of squared deviations of observed from fitted ; a maximum
likelihood estimator of a parameter is that function of sample data that
maximizes the likelihood (i.e., probability) of observing the sample actual selected.
In the case of a linear regression model with quantitative Y and normally
distributed errors, the least-squares estimator and the mle of are actually
the same function of the data.
least-squares
Suppose Y is a bernoulli random variable with population mean. For a sample
of n observations on Y, we can write each observation as a function of and a disturbance:
The least-squares criterion dictates that we choose as our an estimator of the function of sample data that minimizes the sum of squared residuals:
Now suppose we sample n=10 observations on Y, which results in Y=1 six (6) times
and Y=0 four (4) times. Then the sum of squared residuals expressed as a
function of can be written as
Instead of using the well-known formula for the least-squares estimator in this
case, let’s search for the sample value of that minimizes the sum of squared residuals. Once we find that value, we can identify by induction the formula that would have yielded it; this formula will be the least-squares
estimator. I will search for over the values from 0 to 1, incrementing by
.1. Here are the values along with the sum of squared residuals associated with each:
pi_hat ssresid
Here’s the graph of the ssresid against the values of
The value of we’re looking for is .6, which is the sample mean of Y, the sample proportion of 1's. So the sample mean., i.e., proportion is the l-s estimator.
maximum likelihood
In order to develop a mle, one needs to specify the probability
distribution of the dependent variable , be it normal, bernoulli, poisson, etc. For logit models, the dependent variable is distributed as a bernoulli, so this is the case we will treat. To really simplify matters so that the essence of mle becomes clear, we will assume that our model has no x variables on the right-hand-side of the equation, which is tantamount to assuming
i.e., the probability that Y is 1 is constant, so we are just estimating the population proportion of 1's, , the parameter of a bernoulli distribution.
To simplify matters still further (and avoid calculus), we will proceed by induction. We'll find the maximum likelihood estimate of , and then figure out what function of sample data it is, i.e., what's the formula that would have yielded that estimate. The formula will be the mle we’re looking for.
Suppose we have a sample of n=10 observations from a bernoulli population with parameter. Suppose n_1=6 is the number of observations for which Y=1, and n_0=4 is the number for which Y=0. To find the maximum likelihood
estimate of , call it we will compute the likelihood of the
sample for values of between 0 and 1, incrementing by .1.
Recall that the probability of observing any arbitrary observation from a bernoulli distribution can be written as:
Now let’s plot the likelihood against the pi_hats so we can find the maximum graphically.
4. graph likelihd pi_hat,ylabel xlabel(.2, .4, .6, .8) c(l)
Notice that the likelihood is maximized when
which is just the sample proportion of ones , that is n_1/n , also known as the sample mean.
useful functions of the likelihood
Because it will be useful later on, let’s take a look at the logarithm of the likelihood.
**5. gen ln_like = ln(likeli)
p_hat likelihd ln_like
Notice that pi_hat=.6 also maximizes the log of the likelihood.
Later we will have occasion to compute a quantity called the “deviance.” The deviance is defined as
8. gen D = -2lnlike* (2 missing values generated) 9. list p_hat likeli lnlike dev
pi_hat likeli lnlike D
so let’s look at the graph of the deviance.
10.graph D pi_hat,ylabel xlabel(.2, .4, .6, .8) c(l)