Regression Analysis: Understanding Parameter Estimates and Confidence Intervals, Study notes of Statistics

An in-depth analysis of regression analysis, focusing on understanding parameter estimates and constructing confidence intervals. It covers topics such as regression equations, histograms, error in prediction, and statistical inference. The document also includes examples and formulas for calculating parameter estimates and confidence intervals.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-mlx
koofers-user-mlx 🇺🇸

10 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
From last time . . .
●●
●●
●●
●●
●●
●●
●●
60 65 70 75 80
60
65
70
75
Father's span (inches)
Father's height (inches)
corr = 0.78
1
The equations
Regression of y on x (for predicting y from x)
Slope = r SD(y)
SD(x)Goes through the point (¯
x,¯
y)
ˆ
y¯
y=rSD(y)
SD(x)(x¯
x)
ˆ
y=ˆ
β0+ˆ
β1x where ˆ
β1=rSD(y)
SD(x)and ˆ
β0=¯
yˆ
β1¯
x
Regression of x on y (for predicting x from y)
Slope = r SD(x)
SD(y)Goes through the point (¯
y,¯
x)
ˆ
x¯
x=rSD(x)
SD(y)(y¯
y)
ˆ
x=ˆ
β?
0+ˆ
β?
1y where ˆ
β?
1=rSD(x)
SD(y)and ˆ
β?
0=¯
xˆ
β?
1¯
y
2
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Regression Analysis: Understanding Parameter Estimates and Confidence Intervals and more Study notes Statistics in PDF only on Docsity!

From last time...

l

l

l l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

l

ll

l

ll

l

l ll l

l

l l l

l

l

l l l

l

l

l

l l

l l

l

ll

l

l

l l l

l l l l

l

l

l l

l (^) ll

ll

l

l

l

l

l

ll (^) l

l l

l l

l

l

l l

l l l (^) ll

l ll

ll (^) l l

l l

l

l

ll

l l

l

l l l l

l

l

l l

l l

ll (^) l

l

l l

l ll l l

l

l

l l

l

l

l

l

l l

l

l l

l ll

ll

l

l l ll l l

l

l

l

ll l

l

l

l l l l

l

l

l l l l l l

l

l l l

l l l

l

l l

l l

l l l

lll l

l

l

l

ll

l

l l

l

l

l

l

l l^ l

l l ll

l

l

l

l l

l

l l

l l l l l l

l

ll

l l

l l

l l l

l

l

ll

l

l ll l

l l llll

l

l

l l l

l

l ll ll

l ll ll

l l

l l

l (^) l l l

l (^) ll l l

l

l l l

l l l

l l

ll l l

l (^) l l

l l

l l

ll

l

l

l

l

l

l l l l l l l

l

l

l

l l

l l

l

l l

l l l l l l l

l l

l ll

ll l l l l

l

l

ll

l ll

l

l

l ll l

ll

l

l

l

l

l l ll

l l

l l

l

l

l l l l

ll l l ll l

l

l l

l l

l

l

l

l

l

ll ll l l

l l l

l ll l

ll

l l l

l l l l ll

l l l

l

l l

lll

l l (^) l l l l

l l

ll l

l l

l

l l

l l

l

l

ll

l

l l l

l

ll

l l

l

l l

l

l

l

l

l

l l l lll

ll l l ll

l l l l

l (^) ll l l l ll

l

l

l

l l l l l l

l

l l l l

l l ll l

l l

l

l

l l

l l

l l l (^) l

l l l

l l

l l

l

l

l

l ll

l l l l

l

l ll

l l

l l

ll

l

l l

l (^) l

l

l

l ll l

l

l l

l

l

l l

l

l

l

l l l

l l

l

l l

l l

l

l

l

l

l

l

l

l l

l l

l l l

l l l l l l ll

l l l l

l

l l l l l

ll l l

l

l l l l l ll^ ll l (^) l l

l l

l

l ll

l

l

l l l l l

l

ll

l ll l

l l

l (^) ll

l l

l

l l

l l

l l (^) l

ll l

l l

l l l l

ll l l

l l

l l l

l l

l l

l l

l l l l

l

l

l l l

ll

l

l l

l ll l l l

l ll

l ll l

l l

l l l l

l l

l

l l l l

l l l l

ll l

l

l

l l l

l l

l l

ll ll l

l

l

l l l

l l ll

l l

ll l

ll l (^) l l l

l

l

l

l

l

l l

l

l

l l l ll

l

ll

l l

l

l

l ll

l

l l l

l l l

l

ll

l

l l l l

l l

l l

l l

l

l l

l l

l l

l

l l

l l

l l

l l

ll l

l l l

l l (^) l l l l

l

l

l (^) l l

l l

l l

l

l

l

l ll l

l l

l l

l l

l l

l

l

l

l l l l

l l l l

l

l l

l

l l

l

l l

l l

l

l

l

l l l

l l

l (^) ll

l

l

l

l

l l l

l ll

l

l l

l l l l

l

l l l l l

l ll

l

l l

l l l l l l

l l

l l

ll l l l

ll

l l

ll l

l l l l l

l l l ll

l

l

l

l

ll

l

l

l

l

l

l l l

l

l l ll

l l

l

l

l

l l l

l l l

l

ll l l

l l

l l

l l

l

l

l

l

l

l

l

l l

l

l (^) l

l l l

l l

l l

l l

l

l (^) ll

l l l

l

ll

l l l l

ll^ ll

ll

l

l

l l

l

l l

l

l

l

l

l

l

l

l

l

Father's span (inches)

Father's height (inches)

corr = 0.

The equations

Regression of y on x (for predicting y from x)

Slope = r

SD(y)

SD(x)

Goes through the point (

x,

y)

y^ ˆ − ¯y = r

SD(y)

SD(x)

(x − ¯x)

−→ ˆy =

x where

= r

SD(y)

SD(x)

and

= ¯y −

Regression of x on y (for predicting x from y)

Slope = r

SD(x)

SD(y)

Goes through the point (

y,

x)

x^ ˆ − ¯x = r

SD(x)

SD(y)

(y − ¯y)

−→ ˆx =

y where

= r

SD(x)

SD(y)

and

= ¯x −

¯y

Histograms

Spans

span (inches)

mean = 68.

SD = 3.

Heights

height (inches)

mean = 67.

SD = 2.

Error in prediction

Having no information about x,

Predict y as

y

Typical prediction error: SD(y)

For predicting height, SD(y) ≈ 2.

Having been told about x,

Predict y using the regression line:

y =

x

Typical prediction error: SD(y)

1 − r

For predicting height from span, SD(y)

1 − r

Parameter estimates (2)

One can show that

E(

E(

Var(

n

x

SXX

Var(

SXX

Cov(

¯x

SXX

Cor(

−¯x

¯x

+ SXX/n

Note: We’re thinking of the x’s as fixed.

Parameter estimates (3)

One can even show that the distribution of

and

is a bivariate

normal distribution!

∼ N(β, Σ)

where

and Σ = σ

n

¯x

SXX

−x¯

SXX

−¯x

SXX

SXX

l

l

l

l

l

l

l

l

l

l

l

l

H

O

OD

H

O

OD

Statistical inference

We want to test: H

versus H

a

Generally, β

is 0.

We use

t =

se(

∼ t

n – 2

where se(

SXX

Also,

[

− t

α

2

),n – 2

× se(

+ t

α

2

),n – 2

× se(

]

is a (1 – α)×100% confidence interval for β

Results

The calculations in the test H

versus H

a

are

analogous, except that we have to use

se(

×

n

SXX

For the pf3d7 data we get the 95% confidence intervals

(0.342 , 0.364) for the intercept

(– 0.0043 , – 0.0035) for the slope

Testing whether the intercept (slope) is equal to zero, we obtain 70.7 (– 22.0) as test

statistic. This corresponds to a p-value of 7.8 × 10

(8.4 × 10

Now how about that

Testing for the slope being equal to zero, we use

t =

se(

For the squared test statistic we get

t

se(

/SXX

× SXX

(SYY − RSS)/ 1

RSS/n – 2

MS

reg

MSE

= F

The squared t statistic is the same as the F statistic from the ANOVA!

Joint confidence region

A 95% joint confidence region for the two parameters is the set of all

values (β

) that fulfill

T

n

i

x

i

i

x

i

i

x

i

≤ F

(0.95),2,n-

where

and ∆β

Coefficient of determination

In the previous lecture we wrote

SS

reg

= SYY − RSS =

(SXY)

SXX

Define

R

SS

reg

SYY

RSS

SYY

R

is often called the coefficient of determination. Notice that

R

SS

reg

SYY

(SXY)

SXX × SYY

= r

XY

Back to the heme data

The scientist was actually interested in the slopes when one re-scales the y-axis so

that the y-intercept is at 1.

y = β

x +  becomes y/β

)x + 

So we’re really interested in β

We’d estimate that by

, but what about its standard error?

First-order Taylor expansion

Consider f (x, y) = x/y.

A first-order Taylor expansion to approximate the function would be

f (x, y) ≈ f (x

, y

) + (x − x

∂f

∂x

(x

0

,y

0

+ (y − y

∂f

∂y

(x

0

,y

0

Since ∂f /∂x = 1/y and ∂f /∂y = −x/y

, we obtain the following:

x/y ≈ x

/y

+ (x − x

)/y

− (y − y

)x

/y

= (x

/y

)[1 + (x − x

)/x

+ (y − y

)/y

]

How do we use this?

We use the first-order Taylor expansion of

around β

and β

Variance of a ratio

Remember that β

and β

are fixed, while

and

are random.

Add the fact that var(X+Y) = var(X) + var(Y) + 2 cov(X,Y)

var{

} ≈ var{(β

)[1 + (

]}

{var(

+ var(

+ 2 cov(

We then replace β

and β

in this formula with our estimates of them,

and

Further, we replace the variances and covariance with our estimates.

var{

var(

var(

cov(

The estimated SE is then

SE{

[

SE(

]

+ [

SE(

]

cov(