Simple Linear Least-Squares Regression - Lecture Notes | STAT 431, Study notes of Statistics

Material Type: Notes; Class: STATISTICAL INFERENCE; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-9s2
koofers-user-9s2 🇺🇸

9 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 431:
Statistical Inference
Lecture 15: Simple linear least-squares
regression
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Simple Linear Least-Squares Regression - Lecture Notes | STAT 431 and more Study notes Statistics in PDF only on Docsity!

Statistics 431:

Statistical Inference

Lecture 15: Simple linear least-squares

regression

Regression: motivating example

  • (^) Heart catheterization is performed on children with congenital heart

defects. A Teflon tube (catheter) is passed into a major vein or artery and pushed into the heart to obtain information about its functional ability.

  • (^) It is not easy to determine exactly what part of the heart the tip of the

catheter has reached.

  • (^) We would like to predict how much catheter should be inserted in order to

reach the heart (the length of the catheter, Y ) based on something easily observed about the child, such as height ( X ).

  • (^) In a study of 12 children (Weindling, 1977), the exact catheter length

required was determined by using fluoroscopy to check that the tip of the catheter had reached the pulmonary artery.

  • (^) This resulted in a data set of twelve (height, catheter length) pairs:

( X 1 , Y 1 ),... , ( X 12 , Y 12 ).

Predicting Y from X

  • (^) We want to study the relationship between catheter length and child’s

height. Making inferences about one variable Y based on observations of a second variable X is called regression analysis.

  • (^) For each possible height x , there is a population of children with the height

X = x. Catheter length is not exactly the same for all children of height x : it has a population distrn.

  • (^) In other words, there are many population distrns for catheter length Y :

one for each x.

  • (^) Recall that in one-way ANOVA, we also had several distrns for Y : one for

each level of the grouping factor X.

  • (^) We can think of the regression X as a generalization of an ANOVA factor:
    • It has an infinite number of “levels” (one for each x > 0 ), instead of just I levels.
    • These “levels” are ordered , because the positive real numbers have an ordering.

Terminology and notation

  • (^) In regression analysis, Y can be called
    • the response variable (as in ANOVA) ( Y “responds” to the setting of X )
    • the dependent variable ( Y “depends” on X ) and X can be called
  • the predictor variable ( X “predicts” Y )
  • the explanatory variable ( X “explains” Y )
  • the covariate (indeed, X covaries with Y )
  • the independent variable
  • (^) “Independent variable” is a terrible name for X : if X were independent of Y ,

then knowing X would tell us nothing about Y. But, you’ll see it a lot.

  • (^) The notation Y | X = x means “the random variable Y , given that we

observe X = x .”

  • (^) If we have to predict Y | X = 48 , what’s a good guess? A reasonable answer

is the mean of the catheter length population distrn associated with the height 48 inches: E [ Y | X = 48]. This is called the conditional mean of Y given X = 48.

  • (^) The function μ( x ) = E [ Y | X = x ] is called the regression function of Y on

X. (Devore calls this μ Y · x – that’s bizarre notation.) We need to make assumptions about the regression function.

  • (^) The notation indicates what we are trying to do: estimate a large set of mean parameters μ, namely, one for each X = x.
  • (^) We can reduce this problem, which has an infinite number of parameters to

estimate, to a problem with only two parameters to estimate.

  • (^) We accomplish this by assuming μ( x ) is a linear function of x :

μ( x ) = E [ Y | X = x ] = β 0 + β 1 x.

This is a strong assumption! When we make it, we are doing (simple) linear regression , and μ( x ) is called the regression line.

  • (^) The general assumption is that neighboring values of X have conditional

means that are close together. The specific realization of that assumption is a straight-line relationship between x and μ( x ).

  • (^) In the example: if μ( x ) = 12 + 0. 6 x , then every one-inch increase in child’s

height leads to a 0.6-centimeter increase in the expected catheter length.

JMP example: cardiac catheter

Bivariate Fit of Catheter length required By Height

Catheter length required

Height

Linear Fit

Linear Fit Catheter length = 12.124045 + 0.5967612 Height

7

  • (^) We measure the quality of estimated coefficients using the “size” of the

corresponding residuals, i.e. their sum of squares:

S ( βˆ 0 , βˆ 1 ) =

∑^ n

i = 1

e i^2 =

∑^ n

i = 1

( yi − ( βˆ 0 + ˆβ 1 xi ))^2

  • (^) Now we find the pair ( βˆ 0 LS , βˆ 1 LS ) which minimizes S ( βˆ 0 , βˆ 1 ): these are the

least-squares estimates of the linear regression coefficients.

  • (^) Minimizing the residual sum of squares is an optimization problem whose

solution always exists and is unique. In fact, the solution is closed form:

β^ ˆ 1 LS =

i ( ∑ xi^ − ¯ x )( yi^ − ¯ y ) i ( xi^ − ¯ x )^2

Cov( X , Y ) Var̂ ( X ) β^ ˆ 0 LS = ¯ y − ˆβ 1 x ¯

(From now on, we drop the “LS” superscript.)

  • (^) To get this solution: take first partial derivatives of S , set to zero, solve for

( βˆ 0 , βˆ 1 ). Then prove S is a strictly convex function of ( βˆ 0 , βˆ 1 ) – this tells you the estimates are the unique global minimizers of S.

JMP example: cardiac catheter

Bivariate Fit of Catheter length required By Height

Linear Fit

Catheter length = 12.124045 + 0.5967612 Height

Summary of Fit

RSquare 0.

RSquare Adj 0.

Root Mean Square Error 4.

Mean of Response 36.

Observations (or Sum Wgts) 12

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t|

Intercept 12.124045 4.247174 2.85 0.

Height 0.5967612 0.101256 5.89 0.

The ideal simple linear regression model

  • (^) So far, we haven’t given a theoretical justification for using least-squares

estimates of the regression coefficients. We just said that minimizing the residual sum of squares seems reasonable.

  • (^) If we’re willing to add more assumptions, we can prove that least-squares

estimates are, in a certain sense, optimal.

  • (^) The extra assumptions will also allow us to figure out the distribution of

( βˆ 0 , βˆ 1 ), and thus make statistical inferences about them.

  • (^) The assumptions are called the ideal simple linear model : for each

possible x , ( Y | X = x ) = β 0 + β 1 x +  , where  ∼ N ( 0 , σ 2 ). We now assume n independent draws under this model, each using one of the n values x 1 ,... , xn. This gives the following model for the data: ( Yi | Xi = xi ) = β 0 + β 1 xi +  i , with the  i ’s IID N ( 0 , σ 2 ).

  • (^) The ideal linear model says
    • At each value X = x , there is an associated normal population distrn for the response Y.
    • All of these normal distrns have exactly the same variance σ 2.
    • The means of these normal distributions are a linear function of x.
  • (^) If the ideal linear model is true, we can construct an unbiased estimate of

σ 2. It is based on the sum of squared residuals, which we denote SSR:

σ ˆ 2 =

SSR

n − 2

n i = 1 e

2 i n − 2

n i = 1 ( yi^ − ˆ yi^ )

2 n − 2

Notice that there are n − 2 degrees of freedom left over to estimate the variance, because we used two df for ( βˆ 0 , βˆ 1 ).

  • (^) The natural estimate of σ is then σˆ =

σˆ 2.

  • (^) We can also show that, as random variables, ( βˆ 0 , βˆ 1 ) have smaller variance

than any other linear unbiased estimates of (β 0 , β 1 ) – linear here means estimates which are linear functions of y 1 ,... , yn. They are BLUE: the “best linear unbiased estimates.”

Interpolation and extrapolation

  • (^) Simple linear regression makes it possible to draw inferences about any

mean response μ(ˆ x ).

  • (^) Interpolation means drawing inferences about the mean response within

the range of observed x ’s.

  • (^) Using the regression model for interpolation is often reasonable. (e.g., we

predict mean catheter length required for a child who is 42.0 inches tall, a height not observed in the sample.)

  • (^) Extrapolation means drawing inferences about the mean response outside

the range of observed x ’s. This is seldom reasonable. A straight-line model may hold approximately over the region of observed x ’s, but not for all X.

  • (^) Extrapolation in catheter data: μ(ˆ 6 ) = ˆ E [ Y | X = 6 in] = 15. 7 cm = 6. 18 in.

(Ouch.)