Assignment I Questions - Applied Regression Analysis | STAT 333, Assignments of Statistics

Material Type: Assignment; Class: Applied Regression Analysis; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Unknown 2003;

Typology: Assignments

Pre 2010

Uploaded on 09/02/2009

koofers-user-gdj
koofers-user-gdj 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 333 Assignment 1 Due Sept. 19, 2003
1. For the simple linear regression model Yi=
β
0+
β
1Xi+
ε
i,i=1, . . . , n,consider the least
squares fits ˆ
Yi=b0+b1Xiand residuals givenbye
i=Y
iˆ
Y
i=Y
i(b
0+b
1
X
i
),i=1, . . . , n.
Since b0=Yb1X,itiseasy to see (1/n)Σn
i=1ˆ
Yi=Yand hence also that Σn
i=1(Yiˆ
Yi)=0.
Similarly,also showthat the residuals eisatisfy the additional ‘orthogonality’ constraint
Σn
i=1Xi(Yiˆ
Yi)=0.
2. For the simple linear regression model in Exercise 1, under the standard model assump-
tions for the random errors
ε
i,showthat the least squares estimator b1=Sxy /Sxx and
Y=(1/n)Σn
i=1Yihave zero covariance, i.e., Cov( b1,Y)=0,where Sxy =Σn
i=1(XiX)Yi.
3. For the simple linear regression model, the estimator of the error variance
σ
2=Var(
ε
i)is
givenby
S
2=1
n2
n
i=1
Σ(Yiˆ
Yi)21
n2SSE .
Showthat this estimator S2is unbiased for
σ
2,i.e., prove that E(S2)=
σ
2.
Note: Toprove this result, use the following approach. First, verify the identity
Yi(
β
0+
β
1Xi)=[Yi(b0+b1Xi)]+[Y(
β
0+
β
1X)]+[(b1
β
1)(X
iX)].
Then verify that the sum of squares of elements on the left-hand side,
Σn
i=1[Yi(
β
0+
β
1Xi)]2,isequal to the sum of the 3 sums of squares of individual elements
on the right-hand side, due to ‘orthogonality’ in cross-terms (Exer.1). Hence, verify that
Σn
i=1[Yi(
β
0+
β
1Xi)]2=SSE +n[Y(
β
0+
β
1X)]2+Sxx (b1
β
1)2.
Finally,equate the expected values of both sides of the sums of squares relation, to get
n
σ
2=E[SSE ] +nE[Y(
β
0+
β
1X)]2+Sxx E[(b1
β
1)2], ‘evaluate’ the other expected
value terms, and solvefor E[SSE ].Also use definitions and known results for Var( b1)
etc., for instance, E{[Yi(
β
0+
β
1Xi)]2}=Var(Yi)
σ
2by definition of variance.
4. Consider the zero or no intercept model givenbyY
i=
β
1
X
i+
ε
i
,i=1, . . . , n,with the errors
ε
ibeing independent, normal r.v.’ s with mean 0 and variance
σ
2.
i) Derive the least squares estimator b1of
β
1,and also derive the variance of this estimator.
ii) For an arbitrary fixed value X0of X,establish that a 100(1
α
)%confidence interval for
the mean response value E(Y|X0)=
β
1X0is givenby
b
1
X
0±t
(
α
/2)
n1S√  
X2
0/Σn
i=1X2
i,
where S2=Σn
i=1(Yib1Xi)2/(n1) provides an unbiased estimator of
σ
2with n1df.
Note: You can conclude that (b1
β
1)/se( b1)has the tn1distribution.
5. An experiment was conducted to study the mass of a tracer material exchanged between
the main flowofanopen channel and the "dead zone" caused by a sudden open channel
expansion. Researchers need this information to improve the water quality modeling capa-
bility of a river. Itisimportant to determine the exchange constant Kfor varying flow
pf2

Partial preview of the text

Download Assignment I Questions - Applied Regression Analysis | STAT 333 and more Assignments Statistics in PDF only on Docsity!

Statistics 333 Assignment 1 Due Sept. 19, 2003

1. For the simple linear regression model Y

i

= β 0

  • β 1

X

i

  • ε i

, i = 1,... , n , consider the least

squares fits

Y

i

= b 0

  • b 1

X

i

and residuals given by e

i

= Y

i

Y

i

= Y

i

− ( b 0

  • b 1

X

i

), i = 1,... , n.

Since b

0

= Yb 1

X , it is easy to see (1/ n )

n

i = 1

Y

i

= Y and hence also that

n

i = 1

( Y

i

Y

i

Similarly, also show that the residuals e

i

satisfy the additional ‘orthogonality’ constraint

n

i = 1

X

i

( Y

i

Y

i

2. For the simple linear regression model in Exercise 1, under the standard model assump-

tions for the random errors ε

i

, show that the least squares estimator b

1

= S

xy

/ S

xx

and

Y = (1/ n )

n

i = 1

Y

i

have zero covariance, i.e., Cov( b

1

, Y ) = 0 , where S

xy

n

i = 1

( X

i

− X ) Y

i

3. For the simple linear regression model, the estimator of the error variance σ

2

= Var( ε i

) is

given by

S

2

=

n − 2

n

i = 1

( Y

i

Y

i

2

n − 2

SSE.

Show that this estimator S

2

is unbiased for σ

2

, i.e., prove that E ( S

2

) = σ

2

Note: To prove this result, use the following approach. First, verify the identity

Y

i

− ( β 0

  • β 1

X

i

) = [ Y

i

− ( b 0

  • b 1

X

i

) ] + [ Y − ( β

0

  • β 1

X ) ] + [ ( b

1

− β 1

) ( X

i

− X ) ].

Then verify that the sum of squares of elements on the left-hand side,

n

i = 1

[ Y

i

− ( β 0

  • β 1

X

i

) ]

2

, is equal to the sum of the 3 sums of squares of individual elements

on the right-hand side, due to ‘orthogonality’ in cross-terms (Exer. 1). Hence, verify that

n

i = 1

[ Y

i

− ( β 0

  • β 1

X

i

) ]

2

= SSE + n [ Y − ( β

0

  • β 1

X ) ]

2

+ S

xx

( b 1

− β 1

2

.

Finally, equate the expected values of both sides of the sums of squares relation, to get

n σ

2

= E [ SSE ] + n E [ Y − ( β

0

  • β 1

X ) ]

2

+ S

xx

E [ ( b

1

− β 1

2

] , ‘evaluate’ the other expected

value terms, and solve for E [ SSE ]. Also use definitions and known results for Var( b

1

etc., for instance, E {[ Y

i

− ( β 0

  • β 1

X

i

) ]

2

} = Var( Y

i

) ≡ σ

2

by definition of variance.

4. Consider the zero or no intercept model given by Y

i

= β 1

X

i

  • ε i

, i = 1,... , n , with the errors

ε i

being independent, normal r.v.’s with mean 0 and variance σ

2

i) Derive the least squares estimator b

1

of β

1

, and also derive the variance of this estimator.

ii) For an arbitrary fixed value X

0

of X , establish that a 100(1 − α )% confidence interval for

the mean response value E ( Y | X

0

) = β 1

X

0

is given by

b 1

X

0

± t

( α /2)

n − 1

S

X

2

0

n

i = 1

X

2

i

where S

2

n

i = 1

( Y

i

b 1

X

i

2

/( n − 1) provides an unbiased estimator of σ

2

with n − 1 df.

Note: You can conclude that ( b

1

− β 1

)/ se( b

1

) has the t

n − 1

distribution.

5. An experiment was conducted to study the mass of a tracer material exchanged between

the main flow of an open channel and the "dead zone" caused by a sudden open channel

expansion. Researchers need this information to improve the water quality modeling capa-

bility of a river. It is important to determine the exchange constant K for varying flow

conditions. The value of K describes the exchange process when a dead zone appears. In

a study, values of the Froude Numbers ( N

F

) were used to predict K. Numbers are func-

tions of upstream channel velocity and water depth. The data collected were as follows,

with the negative sign of the K values indicating "flushing", the direction of mass transfer

out of the dead zone:

Obs N

F

K

These data are stored in the file: /u/r/e/reinsel/stat333/exchange.dat ; also on webpage.

Perform and show calculations for (ii)-(v) below by ‘direct calculations’ on a calculator or

computer; then use regression in Minitab or other software to confirm calculations.

i) Construct a scatter plot of K versus N

F

, and provide some relevant comments.

ii) Use the least squares method to fit the model K

i

= β 0

  • β 1

( N

F

i

  • ε i

iii) Compute S

2

, R

2

, and obtain the basic analysis of variance (ANOVA) table. Provide

some brief interpretation for these results, e.g., in terms of the amount of variation of the K

values explained by the fitted regression (i.e., the variable N

F

iv) Obtain the standard errors for the least squares estimates b

0

and b

1

, and under the usual

normal theory model assumptions, give a 95% confidence interval for β

1

v) Determine the explicit form for the 95% confidence interval of a mean response

E ( K | N

F

) = β 0

  • β 1

( N

F

) for any fixed value N

F

, and evaluate for three selected and rea-

sonable values of N

F

vi) Obtain the residuals e

i

= K

i

K

i

, and plot these versus the fitted values

K

i

as well as

against the predictor variable ( N

F

i

. Does this plot tend to confirm the basic assumptions

of the linear regression model, or does it suggest violation of any assumption? Explain.