Examples with Answers - Methods of Applied Statistics | STAT 420, Study notes of Data Analysis & Statistical Methods

09_25ans Material Type: Notes; Professor: Stepanov; Class: Methods of Applied Statistics; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 12/09/2010

cmoy21
cmoy21 🇺🇸

2 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 420 Examples for 09/25/2008 Fall 2008
The (normal) simple linear regression model:
Y
i =
α
+
β
x
i +
ε
i ,
where
ε
i’s are independent Normal
(
0
, σ
2
) (
iid Normal
(
0
, σ
2
)
).
α
, β
, and σ
2 are unknown model parameters.
Suppose x
i’s are fixed (not random).
Y
i’s are independent Normal
(
α +
β
x
i
, σ
2
) random variables.
ˆ
=
(
)
( )
2
Y
xx
xx
i
ii
~ N
( )
2
2
,
xx
i
ˆ
=
x
ˆ
Y
~ N
( )
2
22
,
xxn
x
i
i = N
( )
+
1
,
2
2
2
xx
x
n
i
(
)
2
2
ˆ
2
1
S
ˆ
ii
e
x y
n
=
=
2
ˆ
2
1
ii
y y
n
(
)
2
2
S 2
e
n
~
χ
2
(
n – 2
)
1. The owner of Momma Leona’s Pizza restaurant chain believes that if a restaurant
is located near a college campus, then there is a linear relationship between sales
and the size of the student population. Suppose data were collected from a
sample of 10 Momma Leona’s Pizza restaurants located near college campuses.
For the i
th restaurant in the sample, x
i
is the size of the student population (in
thousands) and y
i
is the quarterly sales (in thousands of dollars). The values of x
i
and y
i
for the 10 restaurants in the sample are summarized in the following table:
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Examples with Answers - Methods of Applied Statistics | STAT 420 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

STAT 420

Examples for 09/25/

Fall 2008

The (normal) simple linear regression model:

Y

i

= α + β x

i

i

where ε

i

’s are independent Normal ( 0 , σ

) ( iid Normal ( 0 , σ

α , β , and σ

are unknown model parameters.

Suppose x

i

’s are fixed (not random).

 Y

i

’s are independent Normal ( α + β x

i

) random variables.

^ (^ )

2

Y

x x

x x

i

i i ~ N

2

2

xi x

ˆ^ =  x

Y − ~ N

^ −

2

2 2

 ,

n x x

x

i

i = N

2

2 2  

x x

x

n

i

2 2

S ˆ^

e (^) i i

y x

n

2

i i

y y

n

2

2

n − 2 S e

~ χ

2

( n – 2 )

1. The owner of Momma Leona’s Pizza restaurant chain believes that if a restaurant

is located near a college campus, then there is a linear relationship between sales

and the size of the student population. Suppose data were collected from a

sample of 10 Momma Leona’s Pizza restaurants located near college campuses.

For the i th restaurant in the sample, xi is the size of the student population (in

thousands) and yi is the quarterly sales (in thousands of dollars). The values of xi

and yi for the 10 restaurants in the sample are summarized in the following table:

Restaurant

Student Population

(1000s)

Quarterly Sales

($1000s)

i xi yi

x = 14, y = 130

SXX = 568

SXY = 2,

SYY = 15,

= 5, ˆ^ = 60

y ˆ^ = 60 + 5 ⋅ x

SSE = 1,

R

s e

2 = 191.

Confidence interval for β :

 (^ − )

2 2

 t

ˆ e

x x

s

i

SXX

s e

2

 t

where

2

t

is the appropriate value of t-distribution

with n – 2 degrees of freedom.

Rejection Region:

Reject H

0

if T > t

( 8 df )

t

( 8 df ) = 1.860.

Do NOT Reject H

( 0.05 < p-value < 0.10 )

Confidence interval for α : SXX

x

n

se

2

2

ˆ^ ± t ⋅ +

where 2

t

is the appropriate value of t-distribution

with n – 2 degrees of freedom.

Test statistic for H

: α = α 0

T =

SXX

x

n

se

2

0

ˆ 

( n – 2 degrees of freedom )

d) Construct a 90% confidence interval for α.

SXX

x

n

se

2

2

ˆ^ ± t ⋅ +

10 – 2 = 8 degrees of freedom, t

( )

2 −

e) Test H

: α = 75 vs. H 1

: α < 75. Use a 5% level of significance.

Test Statistic:

T =

SXX

x

n

se

2

0

ˆ 

( )

568

2 −

Rejection Region:

Reject H

0

if T < – t

( 8 df )

  • t

( 8 df ) = – 1.860.

Do NOT Reject H

Confidence interval for σ

2 :

( ) ( )

2

2

1

2

2

2

2

 

n se n s e

2

2

1

2

2

2

2

 

ˆ^ ˆ

n n

where

2

2

2

2

1

 and 

− α^ α

are the appropriate values of χ

2 distribution

with n – 2 degrees of freedom.

f) Construct a 95% confidence interval for σ

2 .

χ

2

( 8 df ) = 17.54, χ

2

( 8 df ) = 2.180.

( 87.229 , 701.835 )


Mean response ( y ) for a fixed value of x : μ ( x ) = μ

y | x

= α + β x.

To estimate μ ( x ), use y ˆ^ =  x

E ( Yˆ^ ) = μ ( x ) = α + β x. Var ( Yˆ^ ) =

( )



SXX

x x

n

2 2 1

Note that α = μ ( x = 0 ), ˆ^ = y ˆ^ if x = 0.

i) Construct a 95% prediction interval for a future value of y corresponding to x = 38.

( Construct 95% limits of prediction if x = 38. )

x = 38 y ˆ^ = 60 + 5 ⋅ x = 60 + 5 ⋅ 38 = 250

( )

SXX

x x

n

y se

2

2

ˆ t 1

10 – 2 = 8 degrees of freedom, t

( )

2 −

j) University of Illinois at Urbana-Champaign has 38 thousand students. The owner

of Momma Leona’s Pizza restaurant chain would agree to open a restaurant near the

UIUC campus, but only if there is enough evidence that the average quarterly sales

would be over $225,000. Test H

: μ ( x = 38 ) = 225 vs. H

: μ ( x = 38 ) > 225.

Use a 5% level of significance.

x = 38.  y ˆ^ = 60 + 5 ⋅ x = 60 + 5 ⋅ 38 = 250.

Test Statistic:

T =

( ) (^ )

568

2 2

e SXX

x x

n

s

y

Rejection Region:

Reject H

0

if T > t

( 8 df )

t

( 8 df ) = 1.860.

Accept H

b) Test the assumption that students do not affect the sales. That is, test

H

: β = 0 vs. H

1

: β ≠ 0 ( the significance of regression test ).

Use a 1% level of significance.

Here V

= { a 1 , aR }, dim ( V 0

) = 1, [ ]

T

ˆ = Y,Y,..., Y

Y ,

V = { a 0

1 + a 1

x , a 0

, a 1

R }, dim ( V ) = 2.

x <- c(2,6,8,8,12,16,20,20,22,26)

y <- c(58,105,88,118,117,137,157,169,149,202)

fit <- lm(y ~ x)

summary(fit)

Call:

lm(formula = y ~ x)

Residuals:

Min 1Q Median 3Q Max

-21.00 -9.75 -3.00 11.25 18.

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 60.0000 9.2260 6.503 0.000187 ***

x 5.0000 0.5803 8.617 2.55e-05 ***


Signif. codes: 0 ***' 0.001**' 0.01 *' 0.05.' 0.1 ` ' 1

Residual standard error: 13.83 on 8 degrees of freedom

Multiple R-Squared: 0.9027, Adjusted R-squared: 0.

F-statistic: 74.25 on 1 and 8 DF, p-value: 2.549e-

anova(fit)

Analysis of Variance Table

Response: y

Df Sum Sq Mean Sq F value Pr(>F)

x 1 14200.0 14200.0 74.248 2.549e-05 ***

Residuals 8 1530.0 191.


Signif. codes: 0 ***' 0.001**' 0.01 *' 0.05.' 0.1 ` ' 1

confint(fit, level=0.90)

5 % 95 %

(Intercept) 42.843745 77.

x 3.920969 6.

> new <- data.frame(x=10)

> predict.lm(fit,new,interval=c("confidence"),level=0.95)

fit lwr upr

[1,] 110 98.583 121.

> new <- data.frame(x=38)

> predict.lm(fit,new,interval=c("confidence"),level=0.95)

fit lwr upr

[1,] 250 216.3396 283.

> predict.lm(fit,new,interval=c("prediction"),level=0.95)

fit lwr upr

[1,] 250 203.6316 296.