Lecture Notes on Multiple Regression Model I | STAT 231B, Study notes of Statistics

Material Type: Notes; Professor: Cui; Class: STATISTCS FOR BIOLOGICL SCIENCES; Subject: Statistics; University: University of California-Riverside; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-b8l
koofers-user-b8l 🇺🇸

10 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MULTPLE REGRESSION-I
Regression analysis examines the relation between a single dependent variable Y and one
or more independent variables X1, …,Xp.
SIMPLE LINEAR REGRESSION MODELS
First Order Model with One Predictor Variable
i1i10i
XY
, i=1,2,…,n
0 is the intercept of the line, and 1 is the slope of
the line. One unite increase in X gives 1 unites
increase in Y.
i is called a statistical random error for the ith
observation Yi. It accounts for the fact that the
statistical model does not give an exact fit to the
data.
i cannot be observed. We assume:
oE(i)=0
oVar(i)=2 for all i=1,…,n
oCov(i, j)=0 for all ij
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Lecture Notes on Multiple Regression Model I | STAT 231B and more Study notes Statistics in PDF only on Docsity!

MULTPLE REGRESSION-I

Regression analysis examines the relation between a single dependent variable Y and one

or more independent variables X 1 , …,Xp.

SIMPLE LINEAR REGRESSION MODELS

First Order Model with One Predictor Variable

i 0 1 i 1 i

Y  X 

, i=1,2,…,n

  0 is the intercept of the line, and  1 is the slope of

the line. One unite increase in X gives 

unites

increase in Y.

i

is called a statistical random error for the i

th

observation Y

i

. It accounts for the fact that the

statistical model does not give an exact fit to the

data.

i

cannot be observed. We assume:

o E(i)=

o Var(

i

for all i=1,…,n

o Cov(

i

j

)=0 for all ij

 Response/Regression function and an example:

 

  i i 1

i 0 1 i 1

EY 9. 5 2. 1 X

EY X

The response Y i

, given the level of X in the i

th

trial Xi, comes from a probability distribution

whose mean is 9.5+2.1 Xi. Therefore,

Response/Regression function relates the

means of the probability distributions of Y (for

given X ) to the level of X.

o We want to find the pair (b 0

, b 1

) that

minimizes

SSE= ^

2

i

e = ^

2

i 0 1 i

(Y b bX)

o We set the partial derivatives of SSE with

respect to b 0 , b 1 equal to zero:

X(Y b bX) 0

( X)( 2 )(Y b bX) 0

SSE

Normalequation(2) :

(Y b bX) 0

( 1 )( 2 )(Y b bX) 0

SSE

Normalequation(1) :

i i 0 1 i

i i 0 1 i

1

i 0 1 i

i 0 1 i

0

Then the solution is (derivation on board);

2

i

i i

1

0 1

(X X )

(X X)(Y Y )

b

Y b X

b

 Properties of the residuals

o

e 0. i

 since the regression line goes

through the point

( X,Y) .

o

 

Ye 0

X e 0 and i i i i The residuals are

uncorrelated with the independent variables

X i

and with the fitted values i

Y

. (prove it on

board)

Why

e 0. i

 ,

Ye 0?

X e 0 and i i i i

 

In fact, from normal equation (1) and (2), we

can immediately tell

e 0 i

 (^) and

X e 0 i i

 (^). Since

   

i i 0 1 i i 0 i 1 i i

Ye (b bX)e b e b Xe

, we can easily

know that

Ye 0?

i i

o Least square estimates are uniquely defined

as long as the values of the independent

variable are not all identical. In that case the

numerator

(X X) 0

2

i

 (^) (draw figure).

 Point Estimation of Error Terms Variance 

o Single Population: Unbiased sample

variance estimator of the population

variance

 

 

2 2

2

i 2

E s

n 1

Y Y

s

 Estimating the Mean Value at Xh:

Estimator

Mean

Variance

Estimated variance

 Analysis of Variance Approach to Regression

Analysis

 

   

 

 

 

 

XX

h

h

XX

h

h

h h

h h

h h

SS

X X

n

s Y MSE

SS

X X

n

Y

EY E Y

Y b b X

EY X

2

2

2

2 2

0 1

0 1

   

   

2

i

2

i

2

i i i

2

i

i i i i

TOT

Y

Y Y Y

Y

Y Y Y

Y Y

Y

Y Y Y

Y Y

SS SSR SSE

 

 

Basic Table

Source of

Variation SS df MS E { MS }

Regression

SSRY Yi

  

2 1

MSR

SSR

1

   

2

1

2

2

X X

i

Error

SSE (^)  Y Yi i

  

2 n - 2

MSE

SSE

n

Total SS (^)  Y YTOT i

2

n - 1

o Test Statistic

o F Distribution

o Numerator

Degrees of Freedom dfR=

o Denominator

Degrees of Freedom dfR=n-

o Hypothesis:

o Decision Rule:

Fitting a regression in SAS

data Toluca;

infile 'C:\stat231B06\ch01ta01.txt';

input lotsize workhrs;

proc reg ; /*least square estimation of regression

coefficient*/

model workhrs=lotsize;

output out=results p=yhat r=residual;/*yhat denotes for fitted values

and residual denotes for residual values*/

 

       

TailProbability :

PF F n

F n

MSE

MSR

F

MSE

MSR

F

 

  1

0

1 1

0 1

If 1 ; 1 , 2

If 1 ; 1 , 2

For levelofsignifican ce

F F n H

F F n H

H

H

Extension to multiple regression model

Extension to multiple regression model

Extension to multiple regression model

Extension to multiple regression model

Degrees of freedom: the number of independent components that are needed to calculate the

respective sum of squares.

(1) SSTOT is the sum of n squared components. However, since

(y y) 0 i

 , only

n-1 components are needed for its calculation. The remaining one can always be

calculated from (^) 

n 1

i 1

n i

y y (y y)

. Hence, SSTOT has n-1 degrees of freedom.

(2) SSE= (^) 

2

i

e

is the sum of n squared components. However, there are p

restrictions among the residuals, coming from the p normal equations

X Y XXb X'(Y Xb) X'e 0

p 1 pp p 1

  

. Hence SSE has n-p degrees of freedom.

(3) SSR= (^) 

2

i

(y ˆ y)

is the sum of p squared components since all fitted values

i

Y

are calculated from the same estimated regression line. One degree of freedom is

lost because of one restriction

( yˆ y) (yˆ y y y) (yˆ y) (y y) 0 i i i i i i i

   . Hence SSR has

p-1 degrees of freedom.

Extension to multiple regression model