














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The general linear model and the hypothesis testing of mean vectors lying in a given subspace. It covers the two-sample and k-sample cases, and explains the concept of least squares estimates (lses) and orthogonal decomposition. The document also mentions the relationship between lses and maximum likelihood estimates (mles), and provides a theorem on the distribution of lses.
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















General Linear Hypothesis
We assume the data vector Y
1
N
′
consists of independent Y
i
∼ N (μ
i
, σ
2
random variables, for i = 1 ,... , N.
We also have a data model hypothesis, namely that the mean vector μ
μ μ can be any
point in a given s-dimensional linear subspace Π
Ω
N
, where s < N.
Many statistical problem can be formulated as follows:
we identify a linear subspace Π
ω
of Π
Ω
of dimension s − r, with 0 < r ≤ s, and
we test the hypothesis
0
: μμμ ∈ Π
ω
against the alternative H
1
: μμμ ∈ Π
Ω
ω
To distinguish Π
ω
from Π
Ω
we may want to call Π
ω
the test hypothesis.
Let Y
i, 1
i,n
i
∼ N (ξ
i
, σ
2
), i = 1 ,... , k be independent samples.
Here N = n
1
+... + n
k
and we have used the traditional double indexing on the
Y ’s, but we could equally well have used a more awkward single index i = 1 ,... , N.
The data model below specifies or reflects k samples with
common variance σ
2
and possibly different means ξ
1
,... , ξ
k
.
Ω
is k-dimensional (s = k) consisting of all vectors of the form
μ
μ μ = (μ
1 , 1
+... + μ
k,n
k
′
= ξ
1
n
1
1’s
(N−n
1
) 0 ’s
′
aaa
1
+... + ξ
k
(N−n
k
) 0’s
n
k
1 ’s
′
aaa
k
= ξ
1
a
a a
1
+... + ξ
k
a
a a
k
The orthogonal vectors a
a a
1
,... , a
a a
k
span Π
Ω
.
The vector aaa
i
has 1 ’s in positions (i, 1 ),... , (i, n
i
) and 0 ’s in the remaining positions.
Here we want to test the hypothesis H
0
: ξ
1
=... = ξ
k
, i.e.,
μμμ = (μ
1 , 1
+... + μ
k,n
k
′
= ξ
1
aaa
1
+... + ξ
k
aaa
k
= ξ
1
(aaa
1
+... + aaa
k
and Π
ω
is the (s − r) dimensional linear subspace spanned by
111 = aaa
1
+... + aaa
k
N 1’s
′
i.e., s − r = k − r = 1 , or r = k − 1.
General Linear Model Schematic Diagram
l
0
Y
R
N
ΠΠ
ΩΩ
ΠΠ
ωω
μμ
μμ
Y −− μμ == e
Comments on the Previous Diagram
μ
μ μ ∈ Π
Ω
is the orthogonal projection of the data vector Y
Y onto Π
Ω
,
i.e., it is the best explanation of Y
Y in terms of Π
Ω
(Least Squares Distance).
The orthogonal complement eee = YYY − μμμˆ is the residual error vector,
i.e., that part of Y
Y not explained by
μ
μ μ ∈ Π
Ω
μ
μ μ + (Y
μ
μ μ) =
μ
μ μ
⊥
e e.
μ
μ μ is the orthogonal projection of the data vector Y
Y onto Π
ω
,
i.e., it is the best explanation of YYY in terms of Π
ω
(Least Squares Distance).
μˆ
μ μ is also the orthogonal projection of μˆ
μ μ onto Π
ω
,
i.e., it is the best explanation of
μ
μ μ in terms of Π
ω
.
μ
μ μ =
μ
μ μ
⊥
μ
μ μ −
μ
μ μ).
The orthogonal complement μμμˆ −
μμμˆ is that part of μμμˆ that cannot be explained by Π
ω
.
It expresses the estimated discrepancy of the unknown μμμ from Π
ω
.
LSE’s = MLE’s (Maximum Likelihood Estimates
Under the normal distribution model for the Y
i
the LSE’s are also the
maximum likelihood estimates (MLE’s) of μμμ w.r.t. to the respective
model constraints μμμ ∈ Π
Ω
and μμμ ∈ Π
ω
.
This follows immediately from the likelihood function for the observed
YYY = yyy = (y
1
,... , y
N
′
L(μ
μ μ, σ) = f
μ
μ μ,σ
(y
1
,... , y
N
σ
2 π
N
exp
N
i= 1
(y
i
− μ
i
2
2 σ
2
which is maximized over Π
Ω
ω
) by minimizing ∑
N
i= 1
(y
i
− μ
i
2
w.r.t. μ
μ μ ∈ Π
Ω
ω
and by taking
σ
2
N
i= 1
(y
i
− μˆ
i
2
/N and
σ
2
N
i= 1
(y
i
μˆ
i
2
/N, respectively.
Theorem: Under the general linear model assumption we have:
μ
μ μ−
μ
μ μ|
2
N
∑
i= 1
μˆ
i
− μˆ
i
2
μ
μ μ|
2
μ
μ μ|
2
N
∑
i= 1
i
μˆ
i
2
N
∑
i= 1
i
− μˆ
i
2
∼ σ
2
χ
2
r,λ
is independent of SSE = |Y
Y − μˆ
μ μ|
2
N
∑
i= 1
i
− μˆ
i
2
∼ σ
2
χ
2
N−s
and thus F =
μμμˆ − μμμˆ|
2
/r
|YYY − μμμˆ|
2
/(N − s)
r, N−s,λ
The noncentrality parameter is
λ = |
μˆ
μ μ(μ
μ μ) − μˆ
μ μ(μ
μ μ)|
2
/σ
2
μˆ
μ μ(μ
μ μ) − μ
μ μ|
2
/σ
2
where
μˆ
μ μ(μ
μ μ) is to be viewed as Π
ω
-LSE when Y
Y = μ
μ μ and μˆ
μ μ(μ
μ μ) is to be viewed as
Ω
-LSE when YYY = μμμ. Of course, μμμˆ(μμμ) = μμμ since μμμ ∈ Π
Ω
.
To find the noncentrality parameter λ we replace Y
Y by μ
μ μ = (ξ,... , ξ, η,... , η)
′
in
μˆ
μ μ(Y
Y ) − μˆ
μ μ(Y
2
mn
1
2
2
μμμˆ(μμμ) − μμμˆ(μμμ)|
2
mn
(ξ − η)
2
(ξ − η)
1 /m + 1 /n
2
Thus the noncentrality parameter is
λ =
[ξ − η]/
1 /m + 1 /n
2
σ
2
= δ
2
where δ is the noncentrality parameter in the two-sample t-test, namely
δ =
ξ − η
σ
1 /m + 1 /n
Here the Π
Ω
-LSE of μ
μ μ minimizes
N
∑
i= 1
i
− μ
i
2
k
∑
i= 1
n
i
∑
j= 1
i, j
− ξ
i
2
μ
μ μ = (
ξ
1
ξ
1
ξ
k
ξ
k
′
ξ
1
a
a a
1
ξ
k
a
a a
k
with
ξ
i
i.
n
i
∑
j= 1
i, j
/n
i
and the Π
ω
-LSE of μ
μ μ minimizes
N
∑
i= 1
i
− μ
i
2
k
∑
i= 1
n
i
∑
j= 1
i, j
− μ)
2
μ
μ μ = (μˆ,... , μˆ)
′
μˆ 1
with
μˆ =
..
k
∑
i= 1
n
i
∑
j= 1
i, j
..
k
∑
i= 1
i.
n
i
/N ⇒ | μμμˆ−
μμμˆ|
2
k
∑
i= 1
n
i
∑
j= 1
i.
..
2
Thus F =
μˆ
μ μ − μˆ
μ μ|
2
/r
Y − μˆ
μ μ|
2
/(N − s)
k
i= 1
n
i
j= 1
i.
..
2
/(k − 1 )
k
i= 1
n
i
j= 1
i, j
..
2
/(N − k)
k− 1 , N−k, λ
The proof is essentially that given in Testing Statistical Hypotheses, 3rd Edition,
chapter 7, by E.L. Lehmann and J.P. Romano (2005)
There it is presented in the context of certain optimality properties and followed by
many explicit examples.
The proof is first given in a case of special linear subspaces
Ω
and
ω
Ω
,
where the statement of the theorem is immediate from the definitions of
χ
2
f
, χ
2
g,λ
and F
f ,g,λ
.
Then it is argued that the general case can always be orthonormally transformed
to the special case and that this does not change the meaning of LSE, i.e., LSE’s
in one framework rotate to LSE’s in the other framework.
Distances and independence don’t change under orthonormal tranforms.
General Linear Model Theorem: Special Case
i
∼ N (ν
i
, σ
2
), i = 1 ,... , N with model hypothesis ν
s+ 1
=... = ν
N
This describes an s-dimensional linear subspace
Ω
of R
N
.
Test H
0
: ν
1
=... = ν
r
= 0 (in addition to the model hypothesis).
This describes an (s − r)-dimensional subspace
ω
Ω
.
N
∑
i=s+ 1
2
i
/σ
2
∼ χ
2
N−s
and
r
∑
i= 1
2
i
/σ
2
∼ χ
2
r, λ
are independent and with noncentrality parameter λ =
∑
r
i= 1
ν
2
i
/σ
2
The natural test rejects H
0
when the corresponding F-statistic is too large, where
r
i= 1
2
i
/r
N
i=s+ 1
2
i
/(N − s)
r, N−s,λ
LSE View of the Noncentrality Parameter
Using U
U = ν
ν ν ∈
Ω
in the
ω
-LSE derivation of ν
ν ν
ω
we have
U − ν
ν ν
ω
2
= |ν
ν ν − ν
ν ν
ω
2
N
∑
i= 1
(ν
i
− ν
ω,i
2
r
∑
i= 1
ν
2
i
s
∑
i=r+ 1
(ν
i
− ν
ω,i
2
N
∑
i=s+ 1
ν
2
i
since ν
ν ν
ω
ω
Ω
r
∑
i= 1
ν
2
i
s
∑
i=r+ 1
(ν
i
− ν
ω,i
2
since ν
ν ν ∈
Ω
r
∑
i= 1
ν
2
i
after minimizing over
ω
, i.e., ν
ω,i
= ν
i
, i = r + 1 ,... , s.
=⇒ λ = |U
U − ν
ν ν
ω
2
/σ
2
r
∑
i= 1
ν
2
i
/σ
2
Orthonormal Transform to the Special Case
The general case can always be orthonormally transformed to the special case.
Let C be an orthonormal matrix, with the first s rows c
c c
′
i
, i = 1 ,... , s, spanning Π
Ω
and c
c c
′
i
, i = r + 1 ,... , s, spanning Π
ω
. Such orthonormal basis vectors can always
be constructed via the Gram-Schmidt process.
Transform U
Y with mean vector ν
ν ν = Cμ
μ μ
and U
1
N
are again independent with common variance σ
2
. Now note that
μ
μ μ ∈ Π
Ω
⇔ μ
μ μ ⊥ c
c c
′
i
or ν
i
= c
c c
′
i
μ
μ μ = 0 for i = s + 1 ,... , N ⇔ ν
ν ν ∈
Ω
and μμμ ∈ Π
ω
⇔ μμμ ⊥ ccc
′
i
or ν
i
= ccc
′
i
μμμ = 0 for i = 1 ,... , r, s+ 1 ,... , N ⇔ ννν ∈
ω