Linear Regression Lecture Notes: Stat 312, Lecture 19 by Moo K. Chung, Study notes of Mathematical Statistics

An introduction to linear regression, a statistical model used to understand the relationship between a dependent variable and one or more independent variables. The concepts of linear regression, the stochastic model, least squares estimation, and the normal equations. It also includes an example of finding the least squares regression line using r statistical software.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-8zk-3
koofers-user-8zk-3 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat 312: Lecture 19
Linear Regression
Moo K. Chung
Nov 30, 2004
Concepts
1. Let xbe the speed of a car and ybe the distance the car
traveled in an hour hour. Then we have model
y=β0+β1x.
Suppose we have npaired measurements (xi, yi), i =
1,· · · , n. Since all measurement are supposed to be
noisy, we introduce a noise term ²in the above equation.
Our modified stochastic model is
y=β0+β1x+²,
where ²N(0, σ2). Since ²is a random variable, we
use Yinstead of yfor convenience:
Y=β0+β1x+².
Note that EY=β0+β1xand VY=σ2.
2. Equivalently we can write the above linear model for
each paired measurement (xi, yj):
Yj=β0+β1xj+²j,
where yjis the observed value of random variable Yj
and ²j². Note that EYj=β0+β1xj. Let ˆ
β0,ˆ
β1be
estimators of β0, β1. Then the predicted values or fitted
values are given by
ˆyj=ˆ
β0+ˆ
β1xj.
The differences between the observations yjand the pre-
dicted values ˆyjare called the residuals (errors), i.e.
rj=yjˆyj=yjˆ
β0ˆ
β1xj.
3. Least squares estimation. The least squares estimation
is a method of estimating parameters β0and β1by min-
imizing the sum of the squared errors (SSE):
SS E =
n
X
j=1
r2
j=
n
X
j=1
(yjˆyj)2=
n
X
j=1
(yjˆ
β0ˆ
β1xj)2.
Then the regression line is given by y=ˆ
β0+ˆ
β1x. By
differentiating SSE with respect to β0and β1, we get
normal equations:
β0+1=y
60 70 80 90 100
60 70 80 90
x
y
0+x2β1=xy
Solving these equations, we get
ˆ
β1=Sxy
Sxx
ˆ
β0=yxSxy
Sxx
,
where the sample covariance Sxy =n(xy ¯x¯y).
Example. 10 students took two midterm exams.
Student k01 02 03 04 05 06 07 08 09 10
Midterm 1k80 75 60 90 99 60 55 85 65 70
Midterm 2k70 60 70 72 95 66 60 80 70 60
Let’s find the least squares regression line.
> x<-c(80, 75, 60, 90, 99, 60, 55,
85, 65, 70)
> y<-c(70, 60, 70, 72, 95, 66, 60,
80, 70, 60)
> plot(x,y)
> rarara <-lm(y˜x)
> rarara
Call: lm(formula = y ˜ x)
Coefficients: (Intercept) x
29.4827 0.5523
> abline(rarara)

Partial preview of the text

Download Linear Regression Lecture Notes: Stat 312, Lecture 19 by Moo K. Chung and more Study notes Mathematical Statistics in PDF only on Docsity!

Stat 312: Lecture 19

Linear Regression

Moo K. Chung

[email protected]

Nov 30, 2004

Concepts

  1. Let x be the speed of a car and y be the distance the car traveled in an hour hour. Then we have model

y = β 0 + β 1 x.

Suppose we have n paired measurements (xi, yi), i = 1 , · · · , n. Since all measurement are supposed to be noisy, we introduce a noise term ≤ in the above equation. Our modified stochastic model is

y = β 0 + β 1 x + ≤,

where ≤ ∼ N (0, σ^2 ). Since ≤ is a random variable, we use Y instead of y for convenience:

Y = β 0 + β 1 x + ≤.

Note that EY = β 0 + β 1 x and VY = σ^2.

  1. Equivalently we can write the above linear model for each paired measurement (xi, yj ):

Yj = β 0 + β 1 xj + ≤j ,

where yj is the observed value of random variable Yj and ≤j ∼ ≤. Note that EYj = β 0 + β 1 xj. Let βˆ 0 , βˆ 1 be estimators of β 0 , β 1. Then the predicted values or fitted values are given by

ˆyj = βˆ 0 + βˆ 1 xj.

The differences between the observations yj and the pre- dicted values yˆj are called the residuals (errors), i.e.

rj = yj − ˆyj = yj − βˆ 0 − βˆ 1 xj.

  1. Least squares estimation. The least squares estimation is a method of estimating parameters β 0 and β 1 by min- imizing the sum of the squared errors (SSE):

SSE =

∑^ n

j=

r^2 j =

∑^ n

j=

(yj −yˆj )^2 =

∑^ n

j=

(yj − βˆ 0 − βˆ 1 xj )^2.

Then the regression line is given by y = βˆ 0 + βˆ 1 x. By differentiating SSE with respect to β 0 and β 1 , we get normal equations :

β 0 + xβ 1 = y

x

y

xβ 0 + x^2 β 1 = xy Solving these equations, we get

β^ ˆ 1 = Sxy Sxx

β^ ˆ 0 = y − x Sxy Sxx

where the sample covariance Sxy = n(xy − x¯¯y).

Example. 10 students took two midterm exams. Student ‖ 01 02 03 04 05 06 07 08 09 10 Midterm 1‖ 80 75 60 90 99 60 55 85 65 70 Midterm 2‖ 70 60 70 72 95 66 60 80 70 60 Let’s find the least squares regression line.

x<-c(80, 75, 60, 90, 99, 60, 55, 85, 65, 70) y<-c(70, 60, 70, 72, 95, 66, 60, 80, 70, 60) plot(x,y) rarara <-lm(y˜x) rarara Call: lm(formula = y ˜ x) Coefficients: (Intercept) x 29.4827 0. abline(rarara)