Linear Statistical Models - Study Guide | STAT 714, Study notes of Statistics

Material Type: Notes; Professor: Tebbs; Class: LINEAR STATISTICL MODELS; Subject: Statistics; University: University of South Carolina - Columbia; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-28u
koofers-user-28u 🇺🇸

10 documents

1 / 151

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 714
LINEAR STATISTICAL MODELS
Fall, 2008
Lecture Notes
Joshua M. Tebbs
Department of Statistics
The University of South Carolina
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Linear Statistical Models - Study Guide | STAT 714 and more Study notes Statistics in PDF only on Docsity!

STAT 714

LINEAR STATISTICAL MODELS

Fall, 2008

Lecture Notes

Joshua M. Tebbs

Department of Statistics

The University of South Carolina

TABLE OF CONTENTS STAT 714, J. TEBBS

TABLE OF CONTENTS STAT 714, J. TEBBS 5.6 Noncentral χ

  • 1 Introduction, Linear Algebra Review, and Random Vectors Contents
    • 1.1 Gauss Markov (GM) models
    • 1.2 Linear models that are not GM
    • 1.3 Linear algebra review
      • 1.3.1 Basic definitions
      • 1.3.2 Inverse
      • 1.3.3 Linear independence and rank
      • 1.3.4 Orthogonality
      • 1.3.5 Vector spaces
      • 1.3.6 Matrix subspaces
      • 1.3.7 Generalized inverses
      • 1.3.8 Projection matrices
      • 1.3.9 Trace and determinant functions
      • 1.3.10 Eigenvalues and eigenvectors
      • 1.3.11 Quadratic forms, definiteness, and factorizations
    • 1.4 Random vectors
      • 1.4.1 Means and variances
      • 1.4.2 Linear transformations
      • 1.4.3 Variance-covariance matrices
      • 1.4.4 Application: Linear prediction
  • 2 The Linear Least Squares Problem
    • 2.1 Least squares estimation
    • 2.2 Geometric considerations
    • 2.3 Reparameterization
    • 2.4 Gram-Schmidt orthomormalization
  • 3 Estimability and Least Squares Estimators TABLE OF CONTENTS STAT 714, J. TEBBS
    • 3.1 Introduction
    • 3.2 Estimability
      • 3.2.1 One-way ANOVA
      • 3.2.2 Two-way crossed ANOVA with no interaction
      • 3.2.3 Two-way crossed ANOVA with interaction
    • 3.3 Reparameterization
    • 3.4 Unique least squares solutions via linear constraints
  • 4 The Gauss-Markov Model
    • 4.1 Introduction
    • 4.2 The Gauss-Markov Theorem - in the GM model
    • 4.4 The geometry of (linear) model misspecification
      • 4.4.1 Underfitting
      • 4.4.2 Overfitting
    • 4.5 The Aitken model and generalized least squares
  • 5 Distributional Theory
    • 5.1 Introduction
    • 5.2 Univariate normal distribution
    • 5.3 Multivariate normal distribution
    • 5.4 Moment generating functions
    • 5.5 Properties of the multivariate normal distribution
      • 5.5.1 Linear transformations
      • 5.5.2 Less-than-full-rank normal distributions
      • 5.5.3 Independence results
      • 5.5.4 Conditional distributions
        • distribution
    • 5.7 Noncentral F distribution
    • 5.8 Distributions of quadratic forms
    • 5.9 Independence of quadratic forms
  • 6 Statistical Inference
    • 6.1 Introduction
    • 6.2 Estimation
    • 6.3 Testing models
    • 6.4 Testing linear parametric functions
    • 6.5 Testing models or testing linear parametric functions?
    • 6.6 Likelihood ratio tests
      • 6.6.1 Constrained estimation
      • 6.6.2 Testing procedure
    • 6.7 Confidence intervals and multiple comparisons
      • 6.7.1 Single intervals
      • 6.7.2 Multiple intervals

1 Introduction, Linear Algebra Review, and Ran-

dom Vectors

Complementary reading from Monahan: Chapter 1 and Appendix A.

INTRODUCTION : This course is about linear models. Linear models are models that

are linear in their parameters. The general form of a linear model is given by

y = Xb + e,

where y is an N × 1 vector of observed responses, X is an N × p (design) matrix of fixed

constants, b is a p × 1 vector of fixed but unknown parameters, and e is an N × 1 vector

of (unobserved) random errors. The model is called a linear model because the mean of

the response vector y is linear in the unknown parameter b.

SCOPE OF APPLICATION : Several models commonly used in statistics are examples of

the general linear model y = Xb + e. These include simple and multiple linear regression

models and analysis of variance (ANOVA) models. Regression models generally refer to

those for which X is full rank, while ANOVA models refer to those for which X consists

of zeros and ones. Other models of this form include analysis of covariance models, some

time series models, and others.

Model I: Least squares model : y = Xb + e. This model makes no assumptions on e.

The parameter space is Θ = {b : b ∈ R

p }.

Model II: Gauss Markov model : y = Xb + e, where E(e) = 0 and cov(e) = σ

2 I. The

parameter space is Θ = {(b, σ

2 ) : (b, σ

2 ) ∈ R

p × R

}.

Model III: Aitken model : y = Xb + e, where E(e) = 0 and cov(e) = σ

2 V, V known.

The parameter space is Θ = {(b, σ

2 ) : (b, σ

2 ) ∈ R

p × R

}.

Model IV: General linear mixed model : y = Xb + e, where E(e) = 0 and cov(e) =

Σ ≡ Σ(θ). The parameter space is Θ = {(b, θ) : (b, θ) ∈ R

p × Ω}, where Ω is the set

of all values of θ for which Σ(θ) is positive definite.

for i = 1, 2 , ..., N , where the ei are uncorrelated random variables with mean 0 and

common variance σ

2 > 0. If x 1 , x 2 , ..., xN are fixed constants, measured without error,

then this model is a special GM model y = Xb + e with

yN × 1 =

y 1

y 2

. . .

yN

, XN × 2 =

1 x 1

1 x 2

. . .

1 xN

, b 2 × 1 =

β 0

β 1

 (^) , e N × 1 =

e 1

e 2

. . .

eN

Example 1.3. Multiple linear regression. Suppose that a response variable y is linearly

related to several independent predictor variables, say, x 1 , x 2 , ..., xk via

yi = β 0 + β 1 xi 1 + β 2 xi 2 + · · · + βkxik + ei,

for i = 1, 2 , ..., N , where ei are uncorrelated random variables with mean 0 and common

variance σ

2 > 0. If the independent variables are fixed constants, measured without

error, then this model is a special GM model y = Xb + e where

y =

y 1

y 2

. . .

yN

, XN ×p =

1 x 11 x 12 · · · x 1 k

1 x 21 x 22 · · · x 2 k

. . .

1 xN 1 xN 2 · · · xN k

, bp× 1 =

β 0

β 1

β 2

. . .

βk

, e =

e 1

e 2

. . .

eN

and p = k + 1. Note that the simple linear regression model is a special case of the

multiple linear regression model with k = 1. §

Example 1.4. One-way ANOVA. Consider an experiment that is performed to compare

a ≥ 2 treatments. For the ith treatment level, suppose that ni experimental units are

selected at random and assigned to the ith treatment. Consider the model

yij = μ + αi + eij ,

for i = 1, 2 , ..., a and j = 1, 2 , ..., ni, where the random errors eij are uncorrelated random

variables with zero mean and common variance σ

2 > 0. If the a treatment effects

α 1 , α 2 , ..., αa are best regarded as fixed constants, then this model is a special case of the

GM model y = Xb + e. To see this, note that, with N =

i ni,

yN × 1 =

y 11

y 12

. . .

yana

, XN ×p =

(^1) n 1 1 n 1 0 n 1 · · · (^0) n 1

(^1) n 2 0 n 2 1 n 2 · · · (^0) n 2

. . .

(^1) na 0 na 0 na · · · (^1) na

, bp× 1 =

μ

α 1

α 2

. . .

αa

where p = a + 1 and eN × 1 = (e 11 , e 12 , ..., eana )

′ , where (^1) ni is an ni × 1 column vector

of ones and (^0) ni is an ni × 1 column vector of zeros. Note that if a = 2, then this data

structure is equivalent to the standard two-sample setup. §

NOTE : In Example 1.4, note that the first column of X is the sum of the last a columns;

i.e., there is a linear dependence in the columns of X. From results in linear algebra,

we know that X is not of full column rank. In fact, the rank of X is r = a, one less

than the number of columns p = a + 1. This is a common characteristic of ANOVA

models; namely, their X matrices are not of full column rank. On the other hand,

(linear) regression models are models of the form y = Xb + e, where X is of full column

rank. See Examples 1.2 and 1.3.

Example 1.5. Two-way nested ANOVA. Consider an experiment with two factors,

where one of the factors, say, Factor B, is nested within Factor A. In other words, every

level of B appears with exactly one level of Factor A. A statistical model for this situation

is given by

yijk = μ + αi + βij + eijk,

for i = 1, 2 , ..., a, j = 1, 2 , ..., bi, and k = 1, 2 , ..., nij. In this model, μ denotes the

overall mean, αi represents the effect due to the ith level of A, βij represents the effect

of the jth level of B, nested within the ith level of A. If all parameters are fixed, and

the random errors eijk are uncorrelated random variables with zero mean and constant

unknown variance σ

2 > 0, then this is a special GM model y = Xb + e. For example,

with a = 3, b = 2, and nij = n = 4, we have

every level of B. Consider the two-factor (crossed) ANOVA model given by

yijk = μ + αi + βj + γij + eijk,

for i = 1, 2 , ..., a, j = 1, 2 , ..., b, and k = 1, 2 , ..., nij , where the random errors eij are

uncorrelated random variables with zero mean and constant unknown variance σ

2 > 0.

If all the parameters are fixed, this is a special GM model y = Xb + e. For example,

with a = 3, b = 2, and nij = n = 3,

y =

y 111

y 112

y 113

y 121

y 122

y 123

y 211

y 212

y 213

y 221

y 222

y 223

y 311

y 312

y 313

y 321

y 322

y 323

, X =

, b =

μ

α 1

α 2

α 3

β 1

β 2

γ 11

γ 12

γ 21

γ 22

γ 31

γ 32

and e = (e 111 , e 112 , ..., e 323 )

. The X matrix is not of full column rank. The rank of X is

r = 6 and there are p = 12 columns. §

Example 1.7. Two-way crossed ANOVA without interaction. Consider an experiment

with two factors (A and B), where Factor A has a levels and Factor B has b levels. The

two-way crossed model without interaction is given by

yijk = μ + αi + βj + eijk,

for i = 1, 2 , ..., a, j = 1, 2 , ..., b, and k = 1, 2 , ..., nij , where the random errors eij are

uncorrelated random variables with zero mean and common variance σ

2 > 0. Note that

no-interaction model is a special case of the interaction model in Example 1.6 when

H 0 : γ 11 = γ 12 = · · · = γ 32 = 0 is true. That is, the no-interaction model is a reduced

version of the interaction model. With a = 3, b = 2, and nij = n = 3 as before, we have

y =

y 111

y 112

y 113

y 121

y 122

y 123

y 211

y 212

y 213

y 221

y 222

y 223

y 311

y 312

y 313

y 321

y 322

y 323

, X =

, b =

μ

α 1

α 2

α 3

β 1

β 2

and e = (e 111 , e 112 , ..., e 323 )

. The X matrix is not of full column rank. The rank of X

is r = 4 and there are p = 6 columns. Also note that the design matrix for the no-

interaction model is the same as the design matrix for the interaction model, except that

the last 6 columns removed (these columns pertain to the 6 interaction terms). §

allows for different slopes. Assuming the same error structure, this reduced ANCOVA

model is also a special GM model y = Xb + e. With a = 3 and n 1 = n 2 = n 3 = 3, as

before, we have

y =

y 11

y 12

y 13

y 21

y 22

y 23

y 31

y 32

y 33

, X =

1 1 0 0 x 11

1 1 0 0 x 12

1 1 0 0 x 13

1 0 1 0 x 21

1 0 1 0 x 22

1 0 1 0 x 23

1 0 0 1 x 31

1 0 0 1 x 32

1 0 0 1 x 33

, b =

μ

α 1

α 2

α 3

β

, e =

e 11

e 12

e 13

e 21

e 22

e 23

e 31

e 32

e 33

The rank of X is r = 4 and there are p = 5 columns. §

1.2 Linear models that are not GM

GOAL: We now provide examples of linear models of the form y = Xb + e that are not

GM models.

TERMINOLOGY : A factor of classification is said to be random if it has an infinitely

large number of levels and the levels included in the experiment can be viewed as a

random sample from the population of possible levels.

Example 1.9. One-way random effects ANOVA. Consider the model

yij = μ + αi + eij ,

for i = 1, 2 , ..., a and j = 1, 2 , ..., ni, where the treatment effects α 1 , α 2 , ..., αa are best

regarded as random; e.g., the a levels of the factor of interest are drawn from a large

population of possible levels. For concreteness, let a = 4 and nij = n = 3. The model

y = Xb + e looks like

y 11

y 12

y 13

y 21

y 22

y 23

y 31

y 32

y 33

y 41

y 42

y 43

= 112 μ +

= Z 1

α 1

α 2

α 3

α 4

= e 1

e 11

e 12

e 13

e 21

e 22

e 23

e 31

e 32

e 33

e 41

e 42

e 43

= e 2

where we identify X = 112 , b = μ, and e = Z 1 e 1 + e 2. This is not a GM model because

cov(e) 6 = σ

2 I. §

Example 1.10. Two-factor mixed model. Consider an experiment with two factors (A

and B), where Factor A is fixed and has a levels and Factor B is random with b levels.

A statistical model for this situation is given by

yijk = μ + αi + βj + eijk,

for i = 1, 2 , ..., a, j = 1, 2 , ..., b, and k = 1, 2 , ..., nij. Here, the αi’s are best regarded as

fixed constants and the βj ’s are best regarded as random variables. This model assumes

no interaction.

APPLICATION : In a randomized block experiment, b blocks may have been selected

randomly from a large collection of available blocks. If the goal is to make a statement

about the large population of blocks (and not those b blocks in the experiment), then

blocks may be considered as a random factor. The treatment effects α 1 , α 2 , ..., αa are

regarded as fixed constants if the a treatments are the only ones of interest.

Example 1.11. Time series models. When measurements are taken on the same ex-

perimental unit over time, the GM model may not be appropriate. This is true be-

cause observations on the same subject are likely correlated. A linear model of the form

y = Xb + e, where E(e) = 0 and cov(e) = σ

2 V, V known, may be more appropriate.

The general form of V is chosen to model the correlation of the observed responses. §

Example 1.12. Random coefficient models. Suppose that t measurements are taken

(over time) on n individuals and consider the model

yij = x

′ ij βi +^ eij^ ,

for i = 1, 2 , ..., n and j = 1, 2 , ..., t; that is, the different p × 1 regression parameters βi

are “subject-specific.” If the individuals are considered to be a random sample, e.g., if

β 1 , β 2 , ..., βn are iid random vectors with mean β and covariance matrix Σββ, we can

write this model as

yij = x

′ ij βi +^ eij

= x

′ ij β ︸︷︷︸ fixed

  • x

′ ij (βi −^ β) +^ eij ︸ ︷︷ ︸ random

If the βi’s are independent of the eij ’s, note that var(yij ) = x

′ ij Σββxij^ +^ σ

2 6 = σ

2 so that

this is not a GM model. §

Example 1.13. Measurement error models. Consider the statistical model

yi = β 0 + β 1 Xi + ei,

where the ei are iid N (0, σ

2 e ), and the^ Xi’s are not observed exactly, but are measured

with non-negligible error so that

Wi = Xi + Ui,

where the Ui are iid N (0, σ

2 U ). Here,

Observed data: (yi, Wi)

Not observed: (Xi, ei, Ui)

Unknown parameters: (β 0 , β 1 , σ

2 e , σ

2 U ).

The model above can be rewritten as

yi = β 0 + β 1 (Wi − Ui) + ei

= β 0 + β 1 Wi + (ei − β 1 Ui) ︸ ︷︷ ︸ = e∗ i

Because the Wi’s are not fixed in advance, we would need E(e

∗ i |Wi) = 0 for this to be a

linear model. However, note that

E(e

∗ i |Wi)^ =^ E(ei^ −^ β^1 Ui|Xi^ +^ Ui)

= E(ei|Xi + Ui) − β 1 E(Ui|Xi + Ui).

The first term is zero if ei is independent of both Xi and Ui. The second term generally

is not zero (unless β 1 = 0, of course) because Ui and Xi + Ui are correlated. Thus, this

is not a GM model. §

1.3 Linear algebra review

1.3.1 Basic definitions

TERMINOLOGY : A matrix A is a rectangular array of elements; e.g.,

A =

The (i, j)th element of A is denoted by aij. The dimensions of A are m (the number of

rows) by n (the number of columns). If m = n, A is square. If we want to emphasize

the dimension of A, we can write Am×n. In this course, we restrict attention to real

matrices; i.e., matrices whose elements are real numbers.

TERMINOLOGY : A vector is a matrix consisting of one column or one row. A column

vector is denoted by an× 1. A row vector is denoted by a 1 ×n. By convention, we assume

a vector is a column vector, unless otherwise noted; that is,

1.3.2 Inverse

TERMINOLOGY : If A is an n × n matrix, and there exists a matrix C such that

AC = CA = I,

then A is nonsingular and C is called the inverse of A; henceforth denoted by A

− 1 .

If A is nonsingular, A

− 1 is unique. If A is a square matrix and is not nonsingular, A is

singular.

SPECIAL CASE : The inverse of the 2 × 2 matrix

A =

a b

c d

 (^) is given by A−^1 =

ad − bc

d −b

−c a

SPECIAL CASE : The inverse of the n × n diagonal matrix

A =

a 11 0 · · · 0

0 a 22 · · · 0

. . .

0 0 · · · ann

is given by A

− 1

a

− 1 11 0 · · ·^0

0 a

− 1 22 · · ·^0 . . .

0 0 · · · a

− 1 nn

Result M.2.

(a) A is nonsingular iff |A| 6 = 0.

(b) If A and B are nonsingular matrices, (AB)

− 1 = B

− 1 A

− 1 .

(c) If A is nonsingular, then (A

′ )

− 1 = (A

− 1 )

′ .

1.3.3 Linear independence and rank

TERMINOLOGY : The m × 1 vectors a 1 , a 2 , ..., an are said to be linearly dependent

if and only if there exist scalars c 1 , c 2 , ..., cn such that

n ∑

i=

ciai = 0

and at least one of the ci’s is not zero; that is, it is possible to express at least one vector

as a nontrivial linear combination of the others. If

n ∑

i=

ciai = 0 =⇒ c 1 = c 2 = · · · = cn = 0,

a 1 , a 2 , ..., an are linearly independent. If m < n, then a 1 , a 2 , ..., an must be linearly

dependent.

NOTE : If a 1 , a 2 , ..., an denote the columns of an m × n matrix A; i.e.,

A =

a 1 a 2 · · · an

then the columns of A are linearly independent if and only if Ac = 0 ⇒ c = 0 , where

c = (c 1 , c 2 , ..., cn)

. Thus, if you can find at least one nonzero c such that Ac = 0 , the

columns of A are linearly dependent.

TERMINOLOGY : The rank of a matrix A is defined as

r(A) = number of linearly independent columns of A

= number of linearly independent rows of A.

The number of linearly independent rows of any matrix is always equal to the number of

linearly independent columns.

TERMINOLOGY : If A is n × p, then r(A) ≤ min{n, p}.

  • If r(A) = min{n, p}, then A is said to be of full rank.
  • If r(A) = n, we say that A is of full row rank.
  • If r(A) = p, we say that A is of full column rank.
  • If r(A) < min{n, p}, we say that A is less than full rank or rank deficient.
  • Since the maximum possible rank of an n × p matrix is the minimum of n and p,

for any rectangular (i.e., non-square) matrix, either the rows or columns (or both)

must be linearly dependent.