Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Assignment I Questions - Applied Regression Analysis | STAT 333, Assignments of Statistics

Material Type: Assignment; Class: Applied Regression Analysis; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Unknown 2003;

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-gdj
koofers-user-gdj 🇺🇸

10 documents

1 / 2

Toggle sidebar

Related documents


Partial preview of the text

Download Assignment I Questions - Applied Regression Analysis | STAT 333 and more Assignments Statistics in PDF only on Docsity!

Statistics 333 Assignment 1 Due Sept. 19, 2003

1. For the simple linear regression model Y

i

= β 0

  • β 1

X

i

  • ε i

, i = 1,... , n , consider the least

squares fits

ˆ

Y

i

= b 0

  • b 1

X

i

and residuals given by e

i

= Y

i

ˆ

Y

i

= Y

i

− ( b 0

  • b 1

X

i

), i = 1,... , n.

Since b

0

= Yb 1

X , it is easy to see (1/ n )

Σ

n

i = 1

ˆ

Y

i

= Y and hence also that

Σ

n

i = 1

( Y

i

ˆ

Y

i

) = 0.

Similarly, also show that the residuals e

i

satisfy the additional ‘orthogonality’ constraint

Σ

n

i = 1

X

i

( Y

i

ˆ

Y

i

) = 0.

2. For the simple linear regression model in Exercise 1, under the standard model assump-

tions for the random errors ε

i

, show that the least squares estimator b

1

= S

xy

/ S

xx

and

Y = (1/ n )

Σ

n

i = 1

Y

i

have zero covariance, i.e., Cov( b

1

, Y ) = 0 , where S

xy

=

Σ

n

i = 1

( X

i

− X ) Y

i

.

3. For the simple linear regression model, the estimator of the error variance σ

2

= Var( ε i

) is

given by

S

2

=

n − 2

n

i = 1

Σ

( Y

i

ˆ

Y

i

)

2

n − 2

SSE.

Show that this estimator S

2

is unbiased for σ

2

, i.e., prove that E ( S

2

) = σ

2

.

Note: To prove this result, use the following approach. First, verify the identity

Y

i

− ( β 0

  • β 1

X

i

) = [ Y

i

− ( b 0

  • b 1

X

i

) ] + [ Y − ( β

0

  • β 1

X ) ] + [ ( b

1

− β 1

) ( X

i

− X ) ].

Then verify that the sum of squares of elements on the left-hand side,

Σ

n

i = 1

[ Y

i

− ( β 0

  • β 1

X

i

) ]

2

, is equal to the sum of the 3 sums of squares of individual elements

on the right-hand side, due to ‘orthogonality’ in cross-terms (Exer. 1). Hence, verify that

Σ

n

i = 1

[ Y

i

− ( β 0

  • β 1

X

i

) ]

2

= SSE + n [ Y − ( β

0

  • β 1

X ) ]

2

+ S

xx

( b 1

− β 1

)

2

.

Finally, equate the expected values of both sides of the sums of squares relation, to get

n σ

2

= E [ SSE ] + n E [ Y − ( β

0

  • β 1

X ) ]

2

+ S

xx

E [ ( b

1

− β 1

)

2

] , ‘evaluate’ the other expected

value terms, and solve for E [ SSE ]. Also use definitions and known results for Var( b

1

)

etc., for instance, E {[ Y

i

− ( β 0

  • β 1

X

i

) ]

2

} = Var( Y

i

) ≡ σ

2

by definition of variance.

4. Consider the zero or no intercept model given by Y

i

= β 1

X

i

  • ε i

, i = 1,... , n , with the errors

ε i

being independent, normal r.v.’s with mean 0 and variance σ

2

.

i) Derive the least squares estimator b

1

of β

1

, and also derive the variance of this estimator.

ii) For an arbitrary fixed value X

0

of X , establish that a 100(1 − α )% confidence interval for

the mean response value E ( Y | X

0

) = β 1

X

0

is given by

b 1

X

0

± t

( α /2)

n − 1

S

√

X

2

0

/

Σ

n

i = 1

X

2

i

,

where S

2

=

Σ

n

i = 1

( Y

i

b 1

X

i

)

2

/( n − 1) provides an unbiased estimator of σ

2

with n − 1 df.

Note: You can conclude that ( b

1

− β 1

)/ se( b

1

) has the t

n − 1

distribution.

5. An experiment was conducted to study the mass of a tracer material exchanged between

the main flow of an open channel and the "dead zone" caused by a sudden open channel

expansion. Researchers need this information to improve the water quality modeling capa-

bility of a river. It is important to determine the exchange constant K for varying flow

conditions. The value of K describes the exchange process when a dead zone appears. In

a study, values of the Froude Numbers ( N

F

) were used to predict K. Numbers are func-

tions of upstream channel velocity and water depth. The data collected were as follows,

with the negative sign of the K values indicating "flushing", the direction of mass transfer

out of the dead zone:

Obs N

F

K

1 0.012500 -0.

2 0.023750 -0.

3 0.025625 -0.

4 0.030000 -0.

5 0.033125 -0.

6 0.038125 -0.

7 0.038125 -0.

8 0.038125 -0.

9 0.041250 -0.

10 0.043125 -0.

11 0.045000 -0.

12 0.046875 -0.

13 0.047500 -0.

14 0.050000 -0.

15 0.051250 -0.

16 0.056250 -0.

17 0.062500 -0.

18 0.068750 -0.

19 0.077500 -0.

20 0.044000 -0.

These data are stored in the file: /u/r/e/reinsel/stat333/exchange.dat ; also on webpage.

Perform and show calculations for (ii)-(v) below by ‘direct calculations’ on a calculator or

computer; then use regression in Minitab or other software to confirm calculations.

i) Construct a scatter plot of K versus N

F

, and provide some relevant comments.

ii) Use the least squares method to fit the model K

i

= β 0

  • β 1

( N

F

)

i

  • ε i

.

iii) Compute S

2

, R

2

, and obtain the basic analysis of variance (ANOVA) table. Provide

some brief interpretation for these results, e.g., in terms of the amount of variation of the K

values explained by the fitted regression (i.e., the variable N

F

).

iv) Obtain the standard errors for the least squares estimates b

0

and b

1

, and under the usual

normal theory model assumptions, give a 95% confidence interval for β

1

.

v) Determine the explicit form for the 95% confidence interval of a mean response

E ( K | N

F

) = β 0

  • β 1

( N

F

) for any fixed value N

F

, and evaluate for three selected and rea-

sonable values of N

F

.

vi) Obtain the residuals e

i

= K

i

ˆ

K

i

, and plot these versus the fitted values

ˆ

K

i

as well as

against the predictor variable ( N

F

)

i

. Does this plot tend to confirm the basic assumptions

of the linear regression model, or does it suggest violation of any assumption? Explain.