
Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to linear regression, a statistical model used to understand the relationship between a dependent variable and one or more independent variables. The concepts of linear regression, the stochastic model, least squares estimation, and the normal equations. It also includes an example of finding the least squares regression line using r statistical software.
Typology: Study notes
1 / 1
This page cannot be seen from the preview
Don't miss anything!

y = β 0 + β 1 x.
Suppose we have n paired measurements (xi, yi), i = 1 , · · · , n. Since all measurement are supposed to be noisy, we introduce a noise term ≤ in the above equation. Our modified stochastic model is
y = β 0 + β 1 x + ≤,
where ≤ ∼ N (0, σ^2 ). Since ≤ is a random variable, we use Y instead of y for convenience:
Y = β 0 + β 1 x + ≤.
Note that EY = β 0 + β 1 x and VY = σ^2.
Yj = β 0 + β 1 xj + ≤j ,
where yj is the observed value of random variable Yj and ≤j ∼ ≤. Note that EYj = β 0 + β 1 xj. Let βˆ 0 , βˆ 1 be estimators of β 0 , β 1. Then the predicted values or fitted values are given by
ˆyj = βˆ 0 + βˆ 1 xj.
The differences between the observations yj and the pre- dicted values yˆj are called the residuals (errors), i.e.
rj = yj − ˆyj = yj − βˆ 0 − βˆ 1 xj.
∑^ n
j=
r^2 j =
∑^ n
j=
(yj −yˆj )^2 =
∑^ n
j=
(yj − βˆ 0 − βˆ 1 xj )^2.
Then the regression line is given by y = βˆ 0 + βˆ 1 x. By differentiating SSE with respect to β 0 and β 1 , we get normal equations :
β 0 + xβ 1 = y
x
y
xβ 0 + x^2 β 1 = xy Solving these equations, we get
β^ ˆ 1 = Sxy Sxx
β^ ˆ 0 = y − x Sxy Sxx
where the sample covariance Sxy = n(xy − x¯¯y).
Example. 10 students took two midterm exams. Student ‖ 01 02 03 04 05 06 07 08 09 10 Midterm 1‖ 80 75 60 90 99 60 55 85 65 70 Midterm 2‖ 70 60 70 72 95 66 60 80 70 60 Let’s find the least squares regression line.
x<-c(80, 75, 60, 90, 99, 60, 55, 85, 65, 70) y<-c(70, 60, 70, 72, 95, 66, 60, 80, 70, 60) plot(x,y) rarara <-lm(y˜x) rarara Call: lm(formula = y ˜ x) Coefficients: (Intercept) x 29.4827 0. abline(rarara)