Applied Regression Analysis - Assignment Six Questions | STAT 333, Assignments of Statistics

Material Type: Assignment; Class: Applied Regression Analysis; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Fall 2003;

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-3rv
koofers-user-3rv 🇺🇸

9 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 333 Assignment 6 Due Nov. 17, 2003
1. The data shown belowrelate to a study on the reaction of formaldehyde with cotton cellu-
lose, and were givenbyDev ore (1991). The data consist of measurements on three predic-
tor variables and the response Ywith n=30 cases. The variables are:
X1=HCHO (formaldehyde) concentration,
X2=catalyst ratio,
X3=curing temperature,
Y=durable press rating, a quantitative measure of wrinkle resistance.
X1X2X3YX
1
X
2
X
3
Y
84100 3.4 410160 4.6
24180 3.2 413100 4.3
74180 4.6 10 10 120 4.9
10 7 120 4.9 54100 2.9
74180 4.6 813140 4.6
77180 4.7 10 1 180 3.6
713140 4.6 213140 3.1
54160 4.5 613180 4.7
47140 4.8 71120 3.4
51100 2.4 513140 4.5
810140 4.7 81160 3.1
24100 2.6 41180 2.8
410180 4.5 61160 2.5
67120 4.7 41100 2.3
10 13 180 4.8 710100 4.6
These data are stored in the file: /u/r/e/reinsel/stat333/press-rating.dat
i) Provide simple scatter plots of Yversus X1,X2,and X3,respectively,and makeany com-
ments that seem suitable.
ii) Fit a full second-order polynomial (response) model to the data, of the general form
Y=
β
0+
β
1X1+
β
2X2+
β
3X3+
β
11 X2
1+
β
22 X2
2+
β
33 X2
3+
β
12 X1X2+
β
13 X1X3+
β
23 X2X3+
ε
Followbyperforming a complete analysis, leading to selection of a ‘reasonable’ final
reduced model (one that satisfies the origin shift criterion, e.g., see pp. 267−268 in text-
book). A ‘complete’ analysis should provide details and include formal justifications (e.g.,
consideration of F-tests, R2,adjusted-R2,and S2values) for the selection of a final reduced
model, and should include the usual plots and examination of (various types of) residuals
from the final fitted model for checking assumptions and adequacyofthe model.
iii) As an additional exercise for these data, for your final fitted model obtain and examine
both the standardized and studentized residuals, and find the Cook statistic value corre-
sponding to each data case. Check to assess whether anyofthe cases seem ‘unusual’ in
terms of outlier behavior or extreme influence, and discuss.
2. i) For the simple linear regression model Yi=
β
0+
β
1Xi+
ε
i,i=1, . . . , n,showthat the ith
diagonal of the ‘hat’ matrix H,hii XX i ( X X ) 1XX itakes the form hii =(1/n)+(XiX)2/Sxx ,
where Sxx =Σn
i=1(XiX)2.(Hint: It is convenient to express the model in ‘centered’ form
pf2

Partial preview of the text

Download Applied Regression Analysis - Assignment Six Questions | STAT 333 and more Assignments Statistics in PDF only on Docsity!

Statistics 333 Assignment 6 Due Nov. 17, 2003

1. The data shown below relate to a study on the reaction of formaldehyde with cotton cellu-

lose, and were given by Devore (1991). The data consist of measurements on three predic-

tor variables and the response Y with n = 30 cases. The variables are:

X

1

= HCHO (formaldehyde) concentration,

X

2

= catalyst ratio,

X

3

= curing temperature,

Y = durable press rating, a quantitative measure of wrinkle resistance.

X

1

X

2

X

3

Y X

1

X

2

X

3

Y

These data are stored in the file: /u/r/e/reinsel/stat333/press-rating.dat

i) Provide simple scatter plots of Y versus X

1

, X

2

, and X

3

, respectively, and make any com-

ments that seem suitable.

ii) Fit a full second-order polynomial (response) model to the data, of the general form

Y = β 0

  • β 1

X

1

  • β 2

X

2

  • β 3

X

3

  • β 11

X

2

1

  • β 22

X

2

2

  • β 33

X

2

3

  • β 12

X

1

X

2

  • β 13

X

1

X

3

  • β 23

X

2

X

3

  • ε

Follow by performing a complete analysis, leading to selection of a ‘reasonable’ final

reduced model (one that satisfies the origin shift criterion, e.g., see pp. 267−268 in text-

book). A ‘complete’ analysis should provide details and include formal justifications (e.g.,

consideration of F-tests, R

2

, adjusted- R

2

, and S

2

values) for the selection of a final reduced

model, and should include the usual plots and examination of (various types of) residuals

from the final fitted model for checking assumptions and adequacy of the model.

iii) As an additional exercise for these data, for your final fitted model obtain and examine

both the standardized and studentized residuals, and find the Cook statistic value corre-

sponding to each data case. Check to assess whether any of the cases seem ‘unusual’ in

terms of outlier behavior or extreme influence, and discuss.

2. i) For the simple linear regression model Y

i

= β 0

  • β 1

X

i

  • ε i

, i = 1,... , n , show that the i th

diagonal of the ‘hat’ matrix H , h

ii

≡ XX

i

′( X ′ X )

− 1 XX i

takes the form h

ii

= (1/ n ) + ( X i

− X )

2

/ S

xx

where S

xx

n

i = 1

( X

i

− X )

2

. (Hint: It is convenient to express the model in ‘centered’ form

as Y

i

= β

0

  • β 1

( X

i

X ) + ε i

, so that XX

i

= [ 1, X

i

− X ]′ and X ′ X = Diag{ n , S

xx

ii) For the model Y

i

= β 0

  • β 1

X

i 1

  • β 2

X

i 2

  • ε i

, i = 1,... , n , determine a condition or cir-

cumstance under which the h

ii

values will take the explicit form h

ii

= (1/ n ) +

( X

i 1

− X

1

2

/ S

x 1

x 1

+ ( X

i 2

− X

2

2

/ S

x 2

x 2

, where S

x 2

x 2

n

i = 1

( X

i 2

− X

2

2

. Verify your result under

the stated condition. (Again, consider the ‘centered’ form Y

i

= β

0

  • β 1

( X

i 1

− X

1

β 2

( X

i 2

− X

2

) + ε i

, and consider circumstances involving orthogonality.)

iii) When the special results for the model in (ii) do hold, also give simple expressions for

the LSE b

2

of β

2

in the model, for Var( b

2

), and for SSR( b

2

| b 0

, b 1

), involving S

x 2

x 2

3. Suppose a response variable Y is fitted by LS using the straight line model

E ( Y ) = β

0

  • β

1

X , based on a sample of n = 7 observations with X -values equal to

−5, − 3, − 1, 0, 1, 3, 5. However, it is feared that there may be some additional quadratic

effect and that Y may actually follow the quadratic model Y = β

0

  • β 1

X + β 2

X

2

i) Under the assumption that the quadratic model was actually the true model, determine

the biases of the LS estimates b

0

and b

1

in estimating β

0

and β

1

when the straight line

model is estimated. [Note: You need to calculate the ‘bias’ matrix (column vector in this

case), A = ( X

1

′ X

1

− 1 X 1

′ XX

2

, where X

1

= [ 11, XX ] with XX = ( − 5, − 3, − 1, 0, 1, 3, 5 )′, and

XX

2

= ( 25, 9, 1, 0, 1, 9, 25 )′ is the column of X

2

-values.]

ii) Use the result from Problem 4(ii) to find the expected value of the MSE

2

from fitting the

‘reduced’ straight line model E ( Y ) = β

0

  • β

1

X , when the quadratic model is actually the

true model. [Note: You need to find the values of

XX

2

= XX

2

− X

1

( X

1

′ X

1

− 1

X 1

′ XX

2

.]

4. Consider the linear regression model YY = X

1

ββ 1

+ X

2

ββ 2

+ εε , where X = [ X

1

, X

2

] is n × p and

X

1

is n × p

1

, p

1

< p. The ‘extra’ regression sum of squares due to inclusion of the X

2

terms, after the X

1

terms, is

SSR( b 2

| b 1

) = S

1

− S

2

= b 2

X

2

′ YY = YY ′

X

2

X

2

X

2

− 1 ˜ X 2

′ YY ≡ YY ′

H

2

YY ,

say, where b

2

X

2

X

2

− 1 ˜ X 2

′ YY ,

H

2

X

2

X

2

X

2

− 1 ˜ X 2

′, and

X

2

= X

2

− X

1

( X

1

′ X

1

− 1

X 1

′ X

2

( I − H

1

) X

2

. SSR( b

2

| b 1

) is also the ‘hypothesis’ sum of squares for testing H

0

: ββ 2

Also recall the useful result that if YY is a n × 1 random vector with mean vector μμ = E ( YY )

and covariance matrix Σ = Cov( YY ), and A is a n × n symmetric matrix of constants, then the

random variable Q = YY ′ A YY (a quadratic form in YY ) has mean or expected value equal to

E ( Q ) = E ( YYA YY ) = tr( A Σ ) + μμ′ A μμ,

where tr( B ) denotes the trace of a matrix B , the sum of its diagonal elements.

i) Use the result above to determine the expected value of SSR( b

2

| b 1

), i.e., determine

E [SSR( b 2

| b 1

)]; hence also give the expected value of the mean square MSR( b

2

| b 1

SSR( b 2

| b 1

)/( p − p

1

). Express your results in simplest terms by noting that X

1

X

2

= 00 and

also X

2

X

2

= X

2

′( I − H

1

) X

2

= X

2

′( I − H

1

) ( I − H

1

) X

2

X

2

X

2

, showing in particular that the

expected values do not involve the parameters ββ

1

, only the parameters ββ

2

and σ

2

ii) When the ‘reduced’ model E ( YY ) = X

1

ββ

1

is estimated by LS, we have the LS estimate

b

1

= ( X

1

′ X

1

− 1

X 1

′ YY and we know that the residual SS is

SSE

2

= YY ′( I − X

1

( X

1

′ X

1

− 1 X 1

′) YY = YYYYb

1

′ X

1

′ YY = SSE

1

  • SSR( b 2

| b 1

where SSE

1

is the residual SS from fitting the ‘full’ model. Using the known fact (e.g., see

Problem 2 of Assignment 4) that E [SSE

1

] = ( np ) σ

2

and the results from (i), determine

E [SSE

2

] and hence also E [MSE

2

], where MSE

2

= SSE

2

/( n − p

1

), under the assumption that

the ‘full’ model is the true model, i.e., ββ

2