Least-Squares Regression - Lecture Notes | STAT 20, Study notes of Statistics

Material Type: Notes; Class: Introduction to Probability and Statistics; Subject: Statistics; University: University of California - Berkeley; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 11/08/2009

koofers-user-6yl
koofers-user-6yl 🇺🇸

9 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Least-squares regression
Cautions about correlation and
regression
Outline:
Least-squares regression.
Equations of regression line: slope,
intercept
Residuals and residual plot
Outliers and influential observations
Cautions about correlation and regression
1
Least-Squares Regression
Regression describes the relationship between two
variables in the situation where one variable can
be used to explain or predict the other.
The regression line is a straight line that describes
how a response variable ychanges as an
explanatory variable xchanges.
2
Fitting the Regression Line to Data
Since we intend to predict yfrom x, the errors of
interest are mispredictions of yfor a fixed x.
The least-squares regression line of yon xis
the line that minimizes sum of squared errors.
This is the least squares criterion.
Given pairs of observations (x1,y1), . . . , (xn, yn),
the regression line is given by
ˆy=a+bx
where b=rsy
sxand a= ¯yb¯x.
3
Interpreting the Regression Model
The response in the model is denoted ˆyto
indicate that these are predictd yvalues, not
the true observed yvalues. The “hat”
denotes prediction.
The slope of the line indicates how much ˆy
changes for a unit change in x.
The intercept is the value of ˆyfor x= 0. It
may or not have a physical interpretation,
depending on whether or not xcan take
values near 0.
To make a prediction for an unobserved x,
just plug it in and calculate ˆy.
Note that the line need not pass through the
observed data points. In fact, it often will not
pass through any of them.
4
pf3

Partial preview of the text

Download Least-Squares Regression - Lecture Notes | STAT 20 and more Study notes Statistics in PDF only on Docsity!

Least-squares regression

Cautions about correlation and

regression

Outline:

  • Least-squares regression.
    • Equations of regression line: slope, intercept
    • Residuals and residual plot
    • Outliers and influential observations
  • Cautions about correlation and regression

1

Least-Squares Regression

Regression describes the relationship between two variables in the situation where one variable can be used to explain or predict the other. The regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.

2 Fitting the Regression Line to Data

Since we intend to predict y from x, the errors of interest are mispredictions of y for a fixed x.

The least-squares regression line of y on x is the line that minimizes sum of squared errors. This is the least squares criterion.

Given pairs of observations (x 1 , y 1 ),... , (xn, yn), the regression line is given by

ˆy = a + bx

where b = r s syx and a = ¯y − b¯x.

Interpreting the Regression Model

  • The response in the model is denoted ˆy to indicate that these are predictd y values, not the true observed y values. The “hat” denotes prediction.
  • The slope of the line indicates how much ˆy changes for a unit change in x.
  • The intercept is the value of ˆy for x = 0. It may or not have a physical interpretation, depending on whether or not x can take values near 0.
  • To make a prediction for an unobserved x, just plug it in and calculate ˆy.
  • Note that the line need not pass through the observed data points. In fact, it often will not pass through any of them.

Facts about Least Squares Regression

  • The distinction between explanatory and response variables is essential. Looking at vertical deviations means that changing the axes would change the regression line.
  • A change of 1 sd in x corresponds to a change of r sds in y.
  • The least squares regression line always passes through the point (¯x, y¯).
  • r^2 (the square of the correlation) is the fraction of the variation in the values of y that is explained by the least squares regression on x. When reporting the results of a linear regression, you should report r^2.

These properties depend on the least-squares fitting criterion and are one reason why that criterion is used.

5

Residuals

Residuals are the vertical distances between the data points and the corresponding predicted values.

ri = observed y − predicted y = yi − yˆi = yi − (a + bxi)

For a least squares regression, the residuals always have mean zero.

6 Residual Plots

A residual plot is a scatterplot of the residuals against the explanatory variable. It can be used to assess the fit of the regression line.

Patterns to look for:

  • Curvature indicates that the relationship is not linear.
  • Increasing or decreasing spread indicates that the prediction will be less accurate in the range of explanatory variables where the spread is larger.
  • Points with large residuals are outliers in the vertical direction.
  • Points that are extreme in the x direction are potential high influence points.

Influential observations are individuals with extreme x values that exert a strong influence on the position of the regression line. Removing them would significantly change the regression line.

A Regression Example

Consider the following data on unemployment rate and unemployment expenditure for several countries: Unemp. Unemp. Country Rate Exp. swz 0.5 0. lux 1.4 0. swd 1.6 0. jap 2.1 0. aut 3.3 0. fin 3.4 0. por 4.6 0. ger 4.7 1. nor 5.2 1. us 5.4 0. uk 6.8 0. gr 7.0 0. aus 7.0 1. bel 7.6 1. nl 7.8 2. nz 7.9 1. can 8.1 1. fr 8.9 1. den 9.7 3. it 10.3 0. ir 13.8 2. sp 15.9 2.

Summary Statistics

¯x = 6. 5 y¯ = 1. 20 sx = 3. 87 sy = 0. 89 r = 0. 73