




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Tebbs; Class: LINEAR STATISTICL MODELS; Subject: Statistics; University: University of South Carolina - Columbia; Term: Fall 2008;
Typology: Study notes
1 / 151
This page cannot be seen from the preview
Don't miss anything!





























































































TABLE OF CONTENTS STAT 714, J. TEBBS 5.6 Noncentral χ
1 Introduction, Linear Algebra Review, and Ran-
dom Vectors
Complementary reading from Monahan: Chapter 1 and Appendix A.
INTRODUCTION : This course is about linear models. Linear models are models that
are linear in their parameters. The general form of a linear model is given by
y = Xb + e,
where y is an N × 1 vector of observed responses, X is an N × p (design) matrix of fixed
constants, b is a p × 1 vector of fixed but unknown parameters, and e is an N × 1 vector
of (unobserved) random errors. The model is called a linear model because the mean of
the response vector y is linear in the unknown parameter b.
SCOPE OF APPLICATION : Several models commonly used in statistics are examples of
the general linear model y = Xb + e. These include simple and multiple linear regression
models and analysis of variance (ANOVA) models. Regression models generally refer to
those for which X is full rank, while ANOVA models refer to those for which X consists
of zeros and ones. Other models of this form include analysis of covariance models, some
time series models, and others.
Model I: Least squares model : y = Xb + e. This model makes no assumptions on e.
The parameter space is Θ = {b : b ∈ R
p }.
Model II: Gauss Markov model : y = Xb + e, where E(e) = 0 and cov(e) = σ
2 I. The
parameter space is Θ = {(b, σ
2 ) : (b, σ
2 ) ∈ R
p × R
}.
Model III: Aitken model : y = Xb + e, where E(e) = 0 and cov(e) = σ
2 V, V known.
The parameter space is Θ = {(b, σ
2 ) : (b, σ
2 ) ∈ R
p × R
}.
Model IV: General linear mixed model : y = Xb + e, where E(e) = 0 and cov(e) =
Σ ≡ Σ(θ). The parameter space is Θ = {(b, θ) : (b, θ) ∈ R
p × Ω}, where Ω is the set
of all values of θ for which Σ(θ) is positive definite.
for i = 1, 2 , ..., N , where the ei are uncorrelated random variables with mean 0 and
common variance σ
2 > 0. If x 1 , x 2 , ..., xN are fixed constants, measured without error,
then this model is a special GM model y = Xb + e with
yN × 1 =
y 1
y 2
. . .
yN
1 x 1
1 x 2
. . .
1 xN
, b 2 × 1 =
β 0
β 1
(^) , e N × 1 =
e 1
e 2
. . .
eN
Example 1.3. Multiple linear regression. Suppose that a response variable y is linearly
related to several independent predictor variables, say, x 1 , x 2 , ..., xk via
yi = β 0 + β 1 xi 1 + β 2 xi 2 + · · · + βkxik + ei,
for i = 1, 2 , ..., N , where ei are uncorrelated random variables with mean 0 and common
variance σ
2 > 0. If the independent variables are fixed constants, measured without
error, then this model is a special GM model y = Xb + e where
y =
y 1
y 2
. . .
yN
, XN ×p =
1 x 11 x 12 · · · x 1 k
1 x 21 x 22 · · · x 2 k
. . .
1 xN 1 xN 2 · · · xN k
, bp× 1 =
β 0
β 1
β 2
. . .
βk
, e =
e 1
e 2
. . .
eN
and p = k + 1. Note that the simple linear regression model is a special case of the
multiple linear regression model with k = 1. §
Example 1.4. One-way ANOVA. Consider an experiment that is performed to compare
a ≥ 2 treatments. For the ith treatment level, suppose that ni experimental units are
selected at random and assigned to the ith treatment. Consider the model
yij = μ + αi + eij ,
for i = 1, 2 , ..., a and j = 1, 2 , ..., ni, where the random errors eij are uncorrelated random
variables with zero mean and common variance σ
2 > 0. If the a treatment effects
α 1 , α 2 , ..., αa are best regarded as fixed constants, then this model is a special case of the
GM model y = Xb + e. To see this, note that, with N =
i ni,
yN × 1 =
y 11
y 12
. . .
yana
, XN ×p =
(^1) n 1 1 n 1 0 n 1 · · · (^0) n 1
(^1) n 2 0 n 2 1 n 2 · · · (^0) n 2
. . .
(^1) na 0 na 0 na · · · (^1) na
, bp× 1 =
μ
α 1
α 2
. . .
αa
where p = a + 1 and eN × 1 = (e 11 , e 12 , ..., eana )
′ , where (^1) ni is an ni × 1 column vector
of ones and (^0) ni is an ni × 1 column vector of zeros. Note that if a = 2, then this data
structure is equivalent to the standard two-sample setup. §
NOTE : In Example 1.4, note that the first column of X is the sum of the last a columns;
i.e., there is a linear dependence in the columns of X. From results in linear algebra,
we know that X is not of full column rank. In fact, the rank of X is r = a, one less
than the number of columns p = a + 1. This is a common characteristic of ANOVA
models; namely, their X matrices are not of full column rank. On the other hand,
(linear) regression models are models of the form y = Xb + e, where X is of full column
rank. See Examples 1.2 and 1.3.
Example 1.5. Two-way nested ANOVA. Consider an experiment with two factors,
where one of the factors, say, Factor B, is nested within Factor A. In other words, every
level of B appears with exactly one level of Factor A. A statistical model for this situation
is given by
yijk = μ + αi + βij + eijk,
for i = 1, 2 , ..., a, j = 1, 2 , ..., bi, and k = 1, 2 , ..., nij. In this model, μ denotes the
overall mean, αi represents the effect due to the ith level of A, βij represents the effect
of the jth level of B, nested within the ith level of A. If all parameters are fixed, and
the random errors eijk are uncorrelated random variables with zero mean and constant
unknown variance σ
2 > 0, then this is a special GM model y = Xb + e. For example,
with a = 3, b = 2, and nij = n = 4, we have
every level of B. Consider the two-factor (crossed) ANOVA model given by
yijk = μ + αi + βj + γij + eijk,
for i = 1, 2 , ..., a, j = 1, 2 , ..., b, and k = 1, 2 , ..., nij , where the random errors eij are
uncorrelated random variables with zero mean and constant unknown variance σ
2 > 0.
If all the parameters are fixed, this is a special GM model y = Xb + e. For example,
with a = 3, b = 2, and nij = n = 3,
y =
y 111
y 112
y 113
y 121
y 122
y 123
y 211
y 212
y 213
y 221
y 222
y 223
y 311
y 312
y 313
y 321
y 322
y 323
, b =
μ
α 1
α 2
α 3
β 1
β 2
γ 11
γ 12
γ 21
γ 22
γ 31
γ 32
and e = (e 111 , e 112 , ..., e 323 )
′
. The X matrix is not of full column rank. The rank of X is
r = 6 and there are p = 12 columns. §
Example 1.7. Two-way crossed ANOVA without interaction. Consider an experiment
with two factors (A and B), where Factor A has a levels and Factor B has b levels. The
two-way crossed model without interaction is given by
yijk = μ + αi + βj + eijk,
for i = 1, 2 , ..., a, j = 1, 2 , ..., b, and k = 1, 2 , ..., nij , where the random errors eij are
uncorrelated random variables with zero mean and common variance σ
2 > 0. Note that
no-interaction model is a special case of the interaction model in Example 1.6 when
H 0 : γ 11 = γ 12 = · · · = γ 32 = 0 is true. That is, the no-interaction model is a reduced
version of the interaction model. With a = 3, b = 2, and nij = n = 3 as before, we have
y =
y 111
y 112
y 113
y 121
y 122
y 123
y 211
y 212
y 213
y 221
y 222
y 223
y 311
y 312
y 313
y 321
y 322
y 323
, b =
μ
α 1
α 2
α 3
β 1
β 2
and e = (e 111 , e 112 , ..., e 323 )
′
. The X matrix is not of full column rank. The rank of X
is r = 4 and there are p = 6 columns. Also note that the design matrix for the no-
interaction model is the same as the design matrix for the interaction model, except that
the last 6 columns removed (these columns pertain to the 6 interaction terms). §
allows for different slopes. Assuming the same error structure, this reduced ANCOVA
model is also a special GM model y = Xb + e. With a = 3 and n 1 = n 2 = n 3 = 3, as
before, we have
y =
y 11
y 12
y 13
y 21
y 22
y 23
y 31
y 32
y 33
1 1 0 0 x 11
1 1 0 0 x 12
1 1 0 0 x 13
1 0 1 0 x 21
1 0 1 0 x 22
1 0 1 0 x 23
1 0 0 1 x 31
1 0 0 1 x 32
1 0 0 1 x 33
, b =
μ
α 1
α 2
α 3
β
, e =
e 11
e 12
e 13
e 21
e 22
e 23
e 31
e 32
e 33
The rank of X is r = 4 and there are p = 5 columns. §
GOAL: We now provide examples of linear models of the form y = Xb + e that are not
GM models.
TERMINOLOGY : A factor of classification is said to be random if it has an infinitely
large number of levels and the levels included in the experiment can be viewed as a
random sample from the population of possible levels.
Example 1.9. One-way random effects ANOVA. Consider the model
yij = μ + αi + eij ,
for i = 1, 2 , ..., a and j = 1, 2 , ..., ni, where the treatment effects α 1 , α 2 , ..., αa are best
regarded as random; e.g., the a levels of the factor of interest are drawn from a large
population of possible levels. For concreteness, let a = 4 and nij = n = 3. The model
y = Xb + e looks like
y 11
y 12
y 13
y 21
y 22
y 23
y 31
y 32
y 33
y 41
y 42
y 43
= 112 μ +
= Z 1
α 1
α 2
α 3
α 4
= e 1
e 11
e 12
e 13
e 21
e 22
e 23
e 31
e 32
e 33
e 41
e 42
e 43
= e 2
where we identify X = 112 , b = μ, and e = Z 1 e 1 + e 2. This is not a GM model because
cov(e) 6 = σ
2 I. §
Example 1.10. Two-factor mixed model. Consider an experiment with two factors (A
and B), where Factor A is fixed and has a levels and Factor B is random with b levels.
A statistical model for this situation is given by
yijk = μ + αi + βj + eijk,
for i = 1, 2 , ..., a, j = 1, 2 , ..., b, and k = 1, 2 , ..., nij. Here, the αi’s are best regarded as
fixed constants and the βj ’s are best regarded as random variables. This model assumes
no interaction.
APPLICATION : In a randomized block experiment, b blocks may have been selected
randomly from a large collection of available blocks. If the goal is to make a statement
about the large population of blocks (and not those b blocks in the experiment), then
blocks may be considered as a random factor. The treatment effects α 1 , α 2 , ..., αa are
regarded as fixed constants if the a treatments are the only ones of interest.
Example 1.11. Time series models. When measurements are taken on the same ex-
perimental unit over time, the GM model may not be appropriate. This is true be-
cause observations on the same subject are likely correlated. A linear model of the form
y = Xb + e, where E(e) = 0 and cov(e) = σ
2 V, V known, may be more appropriate.
The general form of V is chosen to model the correlation of the observed responses. §
Example 1.12. Random coefficient models. Suppose that t measurements are taken
(over time) on n individuals and consider the model
yij = x
′ ij βi +^ eij^ ,
for i = 1, 2 , ..., n and j = 1, 2 , ..., t; that is, the different p × 1 regression parameters βi
are “subject-specific.” If the individuals are considered to be a random sample, e.g., if
β 1 , β 2 , ..., βn are iid random vectors with mean β and covariance matrix Σββ, we can
write this model as
yij = x
′ ij βi +^ eij
= x
′ ij β ︸︷︷︸ fixed
′ ij (βi −^ β) +^ eij ︸ ︷︷ ︸ random
If the βi’s are independent of the eij ’s, note that var(yij ) = x
′ ij Σββxij^ +^ σ
2 6 = σ
2 so that
this is not a GM model. §
Example 1.13. Measurement error models. Consider the statistical model
yi = β 0 + β 1 Xi + ei,
where the ei are iid N (0, σ
2 e ), and the^ Xi’s are not observed exactly, but are measured
with non-negligible error so that
Wi = Xi + Ui,
where the Ui are iid N (0, σ
2 U ). Here,
Observed data: (yi, Wi)
Not observed: (Xi, ei, Ui)
Unknown parameters: (β 0 , β 1 , σ
2 e , σ
2 U ).
The model above can be rewritten as
yi = β 0 + β 1 (Wi − Ui) + ei
= β 0 + β 1 Wi + (ei − β 1 Ui) ︸ ︷︷ ︸ = e∗ i
Because the Wi’s are not fixed in advance, we would need E(e
∗ i |Wi) = 0 for this to be a
linear model. However, note that
E(e
∗ i |Wi)^ =^ E(ei^ −^ β^1 Ui|Xi^ +^ Ui)
= E(ei|Xi + Ui) − β 1 E(Ui|Xi + Ui).
The first term is zero if ei is independent of both Xi and Ui. The second term generally
is not zero (unless β 1 = 0, of course) because Ui and Xi + Ui are correlated. Thus, this
is not a GM model. §
1.3.1 Basic definitions
TERMINOLOGY : A matrix A is a rectangular array of elements; e.g.,
The (i, j)th element of A is denoted by aij. The dimensions of A are m (the number of
rows) by n (the number of columns). If m = n, A is square. If we want to emphasize
the dimension of A, we can write Am×n. In this course, we restrict attention to real
matrices; i.e., matrices whose elements are real numbers.
TERMINOLOGY : A vector is a matrix consisting of one column or one row. A column
vector is denoted by an× 1. A row vector is denoted by a 1 ×n. By convention, we assume
a vector is a column vector, unless otherwise noted; that is,
1.3.2 Inverse
TERMINOLOGY : If A is an n × n matrix, and there exists a matrix C such that
then A is nonsingular and C is called the inverse of A; henceforth denoted by A
− 1 .
If A is nonsingular, A
− 1 is unique. If A is a square matrix and is not nonsingular, A is
singular.
SPECIAL CASE : The inverse of the 2 × 2 matrix
a b
c d
(^) is given by A−^1 =
ad − bc
d −b
−c a
SPECIAL CASE : The inverse of the n × n diagonal matrix
a 11 0 · · · 0
0 a 22 · · · 0
. . .
0 0 · · · ann
is given by A
a
− 1 11 0 · · ·^0
0 a
− 1 22 · · ·^0 . . .
0 0 · · · a
− 1 nn
Result M.2.
(a) A is nonsingular iff |A| 6 = 0.
(b) If A and B are nonsingular matrices, (AB)
− 1 = B
− 1 A
− 1 .
(c) If A is nonsingular, then (A
′ )
− 1 = (A
− 1 )
′ .
1.3.3 Linear independence and rank
TERMINOLOGY : The m × 1 vectors a 1 , a 2 , ..., an are said to be linearly dependent
if and only if there exist scalars c 1 , c 2 , ..., cn such that
n ∑
i=
ciai = 0
and at least one of the ci’s is not zero; that is, it is possible to express at least one vector
as a nontrivial linear combination of the others. If
n ∑
i=
ciai = 0 =⇒ c 1 = c 2 = · · · = cn = 0,
a 1 , a 2 , ..., an are linearly independent. If m < n, then a 1 , a 2 , ..., an must be linearly
dependent.
NOTE : If a 1 , a 2 , ..., an denote the columns of an m × n matrix A; i.e.,
a 1 a 2 · · · an
then the columns of A are linearly independent if and only if Ac = 0 ⇒ c = 0 , where
c = (c 1 , c 2 , ..., cn)
′
. Thus, if you can find at least one nonzero c such that Ac = 0 , the
columns of A are linearly dependent.
TERMINOLOGY : The rank of a matrix A is defined as
r(A) = number of linearly independent columns of A
= number of linearly independent rows of A.
The number of linearly independent rows of any matrix is always equal to the number of
linearly independent columns.
TERMINOLOGY : If A is n × p, then r(A) ≤ min{n, p}.
for any rectangular (i.e., non-square) matrix, either the rows or columns (or both)
must be linearly dependent.