






Studia grazie alle numerose risorse presenti su Docsity
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Prepara i tuoi esami
Studia grazie alle numerose risorse presenti su Docsity
Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity
Trova i documenti specifici per gli esami della tua università
Preparati con lezioni e prove svolte basate sui programmi universitari!
Rispondi a reali domande d’esame e scopri la tua preparazione
Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali
Studia con prove svolte, tesine e consigli utili
Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te
Esplora i documenti più scaricati per gli argomenti di studio più popolari
Ottieni i punti per scaricare
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Una definizione dei concetti di massima verosimiglianza e di stima di massima verosimiglianza (mle) per una distribuzione di campione. Vengono inoltre illustrate le differenze tra un campione di bernoulli e la sua realizzazione, e viene presentata la funzione di verosimiglianza per un campione di bernoulli con densità di probabilità o funzione di massa di probabilità sconosciuta. Vengono inoltre discusse le proprietà della stima di massima verosimiglianza e l'invarianza della stima di massima verosimiglianza.
Tipologia: Dispense
1 / 10
Questa pagina non è visibile nell’anteprima
Non perderti parti importanti!







Advanced statistics
Likelihood theory
( ) ( )
( ) ( )
1
1
,.., 1
,.., 1
n
n
X X n X
X X n X
= Pr ( X (^) 1 = x 1 (^) , X (^) 2 = x 2 ,..., X (^) n = xn )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1 2
1 2
X X X n X
X X X n X
Advanced Statistics 2023-2024: Mid-term 31/10/2022 (prof P. Manfredi, Duration: 100 min)
Name…..………………………….………………..Surname……………….………………………….Student number (matricola)………………….;
3. State the following definitions related to sampling theory: (0) the population of interest (X); (1)
simple random sample; (2) distribution of a sample ; (3) Bernoulli sample. Then, given a population X,
clarify the difference between the concept of (Bernoulli) sample (X 1 ,X 2 ,..,Xn) from this population and
its realization (x 1 ,x 2 ,..,x n ). Finally, postulating that X is a Poisson population (of given parameter 𝜗) for
the number of cars’ accidents per day over the roads of a certain country, assume you draw a
Bernoulli sample of n roads and let (x 1 ,x 2 ,..,xn) be the resulting numbers of daily accidents in each
selected road. Provide the probability distribution of the sample. Last, assuming that 𝜗 = 2. 4 /𝑑𝑎𝑦,
find the probability to observe the following sample of size 3: (x 1 =0,x 2 =0, xn=4).
1 2
3 4
5 6
❑ In the «quality control» example the «real-world» underlying population is any delivery of N=5000 («population size») trims from factory B to factory A (really not interesting !).
❑ From a statistical viewpoint we are only interested in the binary nature of delivered
trims: defective (=1) or undefective (=0): a Bernoulli population!
❑ More specifically: we are interested in inferring the proportion of defective items ( 𝝑 ) but not in the sample (this we observed!): in the population.
❑ 𝝑 is the unknown characteristic of the population i.e., the unknown parameter of our problem that we want to make infer using the data in the sample.
In our example the factory samples n=50 items from the population (the delivery of N=5000). Ex ante (: before the sample is drawn), we have a random experiment described by a random vector of size n=50 ( X 1 ,X 2 ,…, X 50 ), where each component Xi is Bernoulli distributed.
Ex post (: after the sample was drawn) a vector of n=50 “numbers” ( x 1 ,x 2 ,…, x 50 ), that we call the (observed) realization of the sample. Assume for example the 4 defective items are observed in the drawings i=3, 17, 41, 44. The sample realisation is therefore the vector:
1 50
NB. In statistical theory we are obviously interested in the “ex-ante” sample (as a Random Vector). The «ex-post» sample is just a collection of numbers.
❑ The observed («realised») sample ( )
❑ The ( ex-ante) probability of observing exactly this sample
Pr 0,0,1,0....0,1,0....0,1,0,0,1,0,0,0,0,0,0( ) =
( 1, 2,3, 4....16,17,18....40, 41, 42, 43, 44, 45,....,50)
( )
= 1 −
( ) ( )
( ) 4 46 = 1 −
Likelihood
X
X
( )
( )
1
1
1
n
X j j
n
n
X j j
=
=
If the population is discrete with probability mass function p X (x, )
Remark. The set (: domain of the likelihood function) is called the “parametric space”.
If the population is continuous with density function fX(x, )
7 8
9 10
11 12
Remark : The sample proportion ( 𝑝Ƹ=0.08) until now was just an intuitive («analogical») estimate of the unknown parameter, given by the population proportion p.
Max
p Max
Remark. Given the data actually observed, 𝑝Ƹ =0.08 is the « most likely» value of the parameter i.e., the value promoting the largest likelihood. Note that «most likely» does not (at all !) mean «most probable», given that the unknown parameter is a constant (and not a random variable !).
We have now found that : …^ the^ observed^ sample^ proportion ෝ𝒑 is much more than an «intuitive estimate»: it is the value which maximises the likelihood function of our problem.
Remark : in the original problem the sample was actually drawn without replacement. This implies some changes, especially considered that the population is a finite one, with size N=5000. In this circumstance the observed sample implies a reduction in the amplitude of the parametric space, i.e. since the proportion of defective items is at least of 4 over 5000 (given that 4 defective items have been actually observed) we have = (i/5000), for i=4,5..,4954. Moreover the likelihood is not anymore binomial, but hypergeometric. Exercise. Write down the hypergeometric likelihood of the previous problem.
=( 0 , 1 )
Obviously if the population is very large (“infinite”, as it might be the case of a National human community) then the sample must necessarily be small compared to the population, which anyhow means that the choice
remains an excellent approximation.
We seek the ML estimate of a Bernoulli proportion based on a generic Bernoulli sample of size n with realization ( x 1 ,x 2 ,…,x n ) yielding any possible number of successes. The general form of the likelihood is
−
− −
=
−
=
1
1 1
1
1
1
nx n^ x
n x s ns
x
n
i
x nx
n
i
n X i
m i
m i
i i
Note the likelihood has been parametrised into two equivalent forms:
a) Using the observed sample sum (s) i.e., the number of successes in the sample.
Remark. The form of the likelihood written in the previous slide is «abstract». Indeed, in any actual problem it is the observed datum which ultimately determines the relevant parametric space and therefore the form of the MLE.
( ) = ( 1 − ) =( 0 , 1 )
s n − s
( ) = =( 0 , 1
n L
( ) = ( 1 − ) = 0 , 1 )
n
m m
m
LL ( m) log L ( m) log H Kn ( m 2 x m)
2 1
x ML
ML
a) Mean and standard deviation ( m , s ) of the normal population. b) Mean and variance ( m , s 2 ) of a normal population. c) The rate of a Negative exponential population from complete observation of the waiting times (done by PM) d) The mean of a Poisson population (done PM). e) The rate of a Negative exponential population under censored observations (NOT in syllabus 2020 - 2021). f) The parameter of a population uniformly continuous over (0,):
g) The parameter of the 1-parameter “exponential tent” (Laplace)
h) Intercept, slope, and variance of errors (Var( e )) for the basic linear model under normal errors (TA).
( )= exp ( − ) ( −, )
Remark. Bolded underlined exercises done during tutoring.
( )
( )
=
=
− − −
− −
=
− −
n
i
i
n
i
i i
x n n
x n n
i
x
1
2 2
1
2 2 2
2
2
1
2
1
1
2
m s
m s s
m
s
s s
ms
:−m, 0 s
Over the parametric set:
( ) ( )
= (− − )− ( − )
=
=
n
i
i
n
i
i
1
2
2
1
2
2
m s
s
m s s
ms
2 1
2
2
n
i
n xi
nx n
m= x
2 1
2
i
i
( )
n
i
= 1
2
s
( )
n
x x
x
n
i
i
−
=
=
ˆ
ˆ
Is the desired ML estimate for (m,s).
37 38
39 40
41 42
Remark. Note that, though one can straightfowardly compute the square of the MLE for sigma obtaining:
( )
( )
n
x x
n
i
i
ML
−
=
= 1
2
2 s ˆ
One would like to know whether it is also correct to say that:
( ) ( )
n
x x
n
i
i
ML ML
−
= =
= 1
2
2 2 s sˆ
i.e., whether by taking the square of the MLE for the standard deviation one obtains the MLE for the population variance.
The answer is again YES, by the invariance property of MLE (presented during lectures which we recall here)
Theorem (strong invariance of MLE). Let L() be a likelihood of parameter , where є, having MLE. Let g(.) be a biunivocal function and consider the parameter t=g(). Then:
ML
( ( )) ( ) ML ML g g
ˆ
^
=
Remark. Note that t=g() is a parameter which will have its own parametric space =g().
Remark. The invariance theorem actually holds under much more general conditions on the form of function g.
2
=
− −
m
1
2 2
1 n
i
xi
n
s=
0
2
( ) 0
( ) 1
t 0 = g ( 0 )
L ( t 0 )=?
MLE of intercept, slope and variance of the
basic linear model with normal homoscedastic
errors.
i i i Y =+ X + e
( )
E ( ) i j
i j
i
2
e e
e s
43 44
45 46
47 48
i i
=
=
=
=
=
=
=
=
=
=
n
i
i
n
i
ii
n
i
i
n
i
i i i
n
i
i
n
i
ii
n
i
i
n
i
i i
n
i
i
n
i
i i
MLOLS
1
2
1
1
2
1
1
2
1
1
2
1
1
2
1 _
e
e
From which:
( )
( ) ( )
( )
( ) ( ) ( )
^
= =^ =
=
=
=
=
=
=
=
n
i
i
n
i
i
n
i
i
n
i
i i
n
i
i
n
i
i i i
MLOLS
n
i
i
n
i
i i
n
i
i
n
i
i i i
MLOLS
1
2
2
1
2
2
1
2
1
2
1
2
1 _
1
2
1
1
2
1 _
Therefore:
Which shows that the ML_OLS estimator for the slope is unbiased and consistent.
Example. To evaluate the reliability of its product, a factory producing batteries performs the following experiment: a Bernoulli sample of n=10 batteries is followed under standard operation rules until the event of interest (failure) has occurred. The following durations at failure were observed (in days):
Battery 1 2 3 4 5 6 7 8 9 10
Duration 7.5^ 9.2^ 12.7^ 6.9^ 10.4^ 10.6^ 11.4^ 14.4^ 8.8^ 10.
Assuming that the waiting time to failure is distributed according to a negative exponential density of unknown rate , find the corresponding maximum likelihood estimate of , and of the population mean.
Solution. As the distribution of the population is negative exponential, the likelihood function is:
1
1 1
=
−
=
−
=
n
i
i i
x n
n
i
x
n
i
X i
The Likelihood is non-negative, continuous and differentiable over the parametric set, with L(0)=0, L()=0. The LL function therefore admits at least one local maximum. The LL is:
LL ( ) = log L ( ) = n log − n x = : 0
Exercise. Show there is a unique ML estimate, given by:
ML
ML
Therefore:
Exercise. With reference to the previous reliability problem, find the corresponding maximum likelihood estimate of the average duration of batteries.
Solution. In principle we can re-solve (please do it…) the problem on the likelihood of the exponential population reparametrised as:
ML
However by applying the invariance property of MLE we immediately find:
( ) ( ) : 0
0 ,
1
−
m
m
x
X
55 56
57 58
59 60