


























Studia grazie alle numerose risorse presenti su Docsity
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Prepara i tuoi esami
Studia grazie alle numerose risorse presenti su Docsity
Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity
Trova i documenti specifici per gli esami della tua università
Preparati con lezioni e prove svolte basate sui programmi universitari!
Rispondi a reali domande d’esame e scopri la tua preparazione
Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali
Studia con prove svolte, tesine e consigli utili
Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te
Esplora i documenti più scaricati per gli argomenti di studio più popolari
Ottieni i punti per scaricare
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Il modello di regressione multipla, una tecnica statistica utilizzata per analizzare la relazione tra una variabile dipendente e più variabili indipendenti. Vengono discussi i concetti chiave come la matrice di proiezione, il coefficiente di determinazione, il modello di regressione partizionato, la regressione in forma deviata, le proprietà statistiche del stimatore dei minimi quadrati ordinari, l'ortogonalità e il problema del bias da variabili omesse, la regressione con vincoli lineari e l'utilizzo di funzioni trigonometriche nella regressione. Una trattazione approfondita degli aspetti teorici e matematici alla base del modello di regressione multipla, con numerosi esempi e dimostrazioni formali. Questo documento potrebbe essere particolarmente utile per studenti universitari di corsi di econometria, statistica applicata o analisi quantitativa, che necessitano di una comprensione solida dei fondamenti del modello di regressione multipla.
Tipologia: Guide, Progetti e Ricerche
1 / 34
Questa pagina non è visibile nell’anteprima
Non perderti parti importanti!



























Consider
realisations of the regression equation
y = β 0 + β 1 x 1 +
ε,
(2) which can be written in the following form:
y 1
y ... 2
y T
x 11
x 1 k
x 21
x 2 k
x T (^1)
x T k
β 0
β 1
β k
ε 1
ε ... 2
ε T
(^).
(3) This can be represented in summary notation by
y
=
Xβ
ε.
The object is to derive an expression for the ordinary least-squares
estimates of the elements of the parameter vector
β
β 0 , β
1 ,... , β
k ] ′ .
The ordinary least-squares (OLS) estimate of
β
is the value that minimises
β ) =
ε ′ ε
y
−
Xβ
′ ( y
−
Xβ
y ′ y
−
y ′ Xβ
− β ′ X ′ y + β ′
′ Xβ
y ′ y
−
y ′ Xβ
β ′ X
′ Xβ.
(5) According to the rules of matrix differentiation, the derivative is
∂β∂S
y ′ X
β ′ X
′ X.
Setting this to zero gives 0 =
β
′ X ′ X − y ′
, which is transposed to provide
(6) the so-called normal equations:
′ Xβ
′ y.
(7) unique solution, which is the vector of ordinary least-squares estimates: On the assumption that the inverse matrix exists, the equations have a
βˆ
′ X ) − 1 X ′
y.
that the total sum of squares This is an instance of Pythagorus theorem; and the equation indicates
y ′ y
is equal to the regression sum of squares
βˆ ′ X
′ X
βˆ
plus the residual or error sum of squares
e ′ e .
By projecting
y
perpendicularly onto the manifold of
, the distance
between
y
and
P y
βˆ
is minimised.
Proof.
Let
γ
P g
be an arbitrary vector in the manifold of
. Then
( y − γ ) ′ ( y − γ
y
−
βˆ ) + (
βˆ
− γ ) } ′ { ( y − X
βˆ ) + (
βˆ
γ ) }
y
y − g ) } ′ { ( I − P
y
y − g ) }.
The properties of
indicate that
( y − γ ) ′ ( y − γ
y ′ ( I − P
y
y − g ) ′ P
y
−
g )
e ′ e
βˆ
− γ ) ′ ( X
βˆ
γ ) .
Since the squared distance (
βˆ
γ ) ′ ( X
βˆ
γ ) is nonnegative, it follows
that (
y
−
γ ) ′ ( y − γ ) ≥ e ′ e
, where
e
=
y
−
βˆ
; which proves the assertion.
The Coefficient of Determination
A summary measure of the extent to which the ordinary least-squares
regression accounts for the observed vector
y
is provided by the coefficient
(11) of determination. This is defined by
2
=
βˆ
′ X
′ X
βˆ
y ′ y
y ′ P y
y ′ y
vectors The measure is just the square of the cosine of the angle between the
y
and
P y
βˆ ; and the inequality 0
2
≤
1 follows from the
fact that the cosine of any angle must lie between
1 and +1.
If
is a square matrix of full rank, with as many regressors as
observations, then
−
1
exists and
′−
1 } X
and so
2
If
′ y
= 0, then,
P y
= 0 and
2
But, if
y
is
distibuted continuously, then this event has a zero probability.
The Partitioned Regression Model
Consider partitioning the regression equation of (3) to give
y
= [
1
2
] [
β^ 1
β 2 ] + ε = X 1 β 1 + X 2 β 2 +
ε,
where [
1 , X
2 ] =
and [
β (^1) ′ , β
(^2) ′ ] ′
=
β
. The normal equations of (6) can
be partitioned likewise:
(^1) ′ X 1 β 1 + X
(^1) ′ X 2 β 2 = X
(^1) ′ y,
(^2) ′ X 1 β 1 + X
(^2) ′ X 2 β 2 = X
(^2) ′ y.
From (13), we get the(14)
(^1) ′ X 1 β 1 = X
(^1) ′ ( y − X 2 β 2
), which gives
βˆ 1
= (
(^1) ′ X 1 ) − 1 X
(^1) ′ ( y
−
2 βˆ 2 ) .
To obtain an expression for
βˆ 2 , we must eliminate
β 1
from equation (14).
For this, we multiply equation (13) by
(^2) ′ X
1 ( X
(^1) ′ X
1 ) −
1
to give
(^2) ′ X 1 β 1 + X
(^2) ′ X
1 ( X
(^1) ′ X 1 ) − 1 X
(^1) ′ X 2 β 2 = X
(^2) ′ X
1 ( X
(^1) ′ X 1 ) − 1 X
(^1) ′ y.
(14) From
(^2) ′ X 1 β 1 + X
(^2) ′ X 2 β 2 = X
(^2) ′ y,
(16) we take the resulting equation
(^2) ′ X 1 β 1 + X
(^2) ′ X
1 ( X
(^1) ′ X 1 ) − 1 X
(^1) ′ X 2 β 2 = X
(^2) ′ X
1 ( X
(^1) ′ X 1 ) − 1 X
(^1) ′ y
(17) to give
(^2) ′ X
2
−
(^2) ′ X
1 ( X
(^1) ′ X 1 ) − 1 X
(^1) ′ X 2 } β 2 = X
(^2) ′ y
−
(^2) ′ X
1 ( X
(^1) ′ X 1 ) − 1 X
(^1) ′ y.
On defining
(^1) ′ X 1 ) − 1 X
(^1) ′ , equation (17) can be written as
(^2) ′ ( I − P 1 ) X 2 } β 2 = X
(^2) ′ ( I − P 1 )
y,
(20) whence
βˆ 2
=
(^2) ′ ( I − P 1 ) X 2 } − 1 X
(^2) ′ ( I − P 1 )
y.
To understand the effect of the operator
ι , consider
ι ′ y
=
T
y t , ( ι ′ ι ) − 1 ι ′ y =
T
y t = ¯
y,
and
ι y
=
ι ¯y
=
ι ( ι ′ ι ) − 1 ι ′
y
= [¯
y,
¯y,... ,
¯y ] ′ .
Here,
ι y
y,
¯y,... ,
¯y ] ′
is a column vector containing
repetitions of
the sample mean.
From the above, it can be understood that, if
x
x 1 , x
2 ,... x
T (^) ] ′
is
vector of
elements, then
x ′ ( I − P ι ) x = T
x t ( x t −
¯x ) =
T
x t −
¯x ) x t =
T
x t −
¯x ) 2 .
The final equality depends on the fact that
x t (^) −
(^) ¯x
)¯ x
x
∑
x t (^) −
(^) ¯x
) = 0.
The Regression Model in Deviation Form
Consider the matrix of cross-products in equation (24). This is
′ ( I − P ι ) Z = { ( I − P ι ) Z } ′
{ Z ( I − P ι ) }
′ ( Z
Here,
contains the sample means of the
k
explanatory variables repeated
times. The matrix (
I − P ι ) Z
) contains the deviations of the
data points about the sample means. The vector (
ι ) y
= (
y (^) −
(^) ι ¯y ) may
be described likewise.
It follows that the estimate
βˆ z = { Z ′ ( I − P ι ) Z } − 1 Z ′ ( I − P ι ) y
is
(28) obtained by applying the least-squares regression to the equation
y 1
−
¯y
y 2
−
¯y
y T
¯y
x 11
¯x 1
x 1 k
−
¯x k
x 21
¯x 1
x 2 k
−
¯x k
x T (^1)
−
¯x 1
x T k
¯x k
β 1
β k
ε 1
−
¯ε
ε 2
−
¯ε
ε T
¯ε
(^) ,
which lacks an intercept term.
The Assumptions of the Classical Linear Model
Consider the regression equation
y
=
Xβ
ε,
where
y
y 1 , y
2 ,... , y
T
] ′ ,
ε
ε 1 , ε
2 ,... , ε
T (^) ] ′ ,
β
β 0 , β
1 ,... , β
k ] ′
and
x tj
(^) ], with
x t 0
= 1 for all
t .
It is assumed that the disturbances have expected values of zero. Thus
ε ) = 0
or, equivalently,
ε t ) = 0
t = 1
(34) have a common variance. Thus Next, it is assumed that they are mutually uncorrelated and that they
ε ) =
εε
′ ) =
σ 2 I,
or
E ( ε t ε s
σ^ 2 ,
if
t
=
s ;
if
t
s .
If
t
is a temporal index, then these assumptions imply that there is
no inter-temporal correlation in the sequence of disturbances.
that A conventional assumption, borrowed from the experimental sciences, is
is a nonstochastic matrix with linearly independent columns.
Linear independence is necessary in order to distinguish the separate
effects of the
k
explanatory variables.
In econometrics, it is more appropriate to regard the elements of
(37) as random variables distributed independently of the disturbances:
′ ε | X
′ E
( ε ) = 0
(38) Then,
βˆ
y
is unbiased such that
βˆ ) =
β.
(39) To demonstrate this, we may write
βˆ
X ′ X ) − 1 X ′ y
′ X ) − 1 X ′ (
Xβ
ε )
β
ε.
(40) Taking expectations gives
βˆ ) =
β
X ′ X ) − 1 X ′ E ( ε )
β.
Matrix Traces
If
a ij (^) ] is a square matrix, then Trace(
a ii . If
a ij (^) ]
is of order
n
×
m
and
b k
] is of order
m
n , then
c i
]
with
c i
m
a ij (^) b j
and
d kj
(^) ]
with
d kj
n
=
b k
a j
(^).
(46) Now,
Trace(
n
m
a ij
(^) b ji
and
Trace(
m
n
=
b j
a j
n
=
m
a j
(^) b j
.
Apart from a change of notation, where
replaces
i , the expressions on
the RHS are the same. It follows that Trace(
) = Trace(
). For three
factors
, we have Trace(
) = Trace(
) = Trace(
Estimating the Variance of the Disturbance
It is natural to estimate
σ 2 = V ( ε t
) via its empirical counterpart.
With
e t = y t − x
t. βˆ
in place of
ε t , it follows that
(^) −
1
∑
t e t 2
may be used
to estimate
σ 2 .
However, it transpires that this is biased.
An unbiased
(48) estimate is provided by
ˆσ 2
=
k
T
e t 2
=
k (^) ( y
−
βˆ
) ′ ( y − X
βˆ ) .
expected value of ( The unbiasedness of this estimate may be demonstrated by finding the
y
−
βˆ ) ′ ( y
−
βˆ ) =
y ′ ( I − P
y .
Given that (
y
= (
Xβ
(^) ε ) = (
ε
in consequence of
the condition (
= 0, it follows that
E { ( y − X
βˆ ) ′ ( y − X
βˆ
) } = E ( ε ′ ε ) − E ( ε ′
P ε
Statistical Properties of the OLS Estimator
The expectation or mean vector of
βˆ , and its dispersion matrix as
(53) well, may be found from the expression
βˆ
′ X ) − 1 X ′
Xβ
ε )
β
′ X ) − 1 X ′
ε.
(54) The expectation is
βˆ ) =
β
X ′ X ) − 1 X ′ E ( ε )
β.
Thus,
βˆ
is an unbiased estimator.
The deviation of
βˆ
from its expected
value is
βˆ (^) −
βˆ ) = (
ε
. Therefore, the dispersion matrix, which
contains the variances and covariances of the elements of
βˆ
, is
βˆ
) =
βˆ
βˆ ) }{
βˆ
βˆ ) } ′ ]
′ X ) − 1 X ′
εε
−
1
= σ 2 ( X ′
−
1 .
The Gauss–Markov theorem asserts that
βˆ
is the unbiased linear es-
(56) timator of least dispersion. Thus,
If
βˆ
is the OLS estimator of
β , and if
β ∗
is any other linear unbiased
estimator of
β
, then
q ′ β ∗ ) ≥ V ( q ′
βˆ ), where
q
is a constant vector.
Proof
. Since
β ∗
=
Ay
is an unbiased estimator, it follows that
β
∗ ) =
y ) =
AXβ
β
, which implies that
Now write
. Then,
implies that
= 0. It follows that
β ∗ ) =
y ) A
′
= σ 2 { ( X ′ X ) − 1 X ′ + G
′ X ) − 1 + G ′
= σ 2 ( X ′
X ) − 1 + σ 2
βˆ ) +
σ
2 GG
′ .
Therefore, for any constant vector
q
of order
k , there is
V ( q ′ β ∗
q ′ D
βˆ
) q + σ 2 q ′
′ q
q ′ D
βˆ
) q = V ( q ′
βˆ );
and thus the inequality
V ( q ′ β ∗ ) ≥ V ( q ′
βˆ ) is established.