Understanding Regression Lines: Predictions, Slope, Intercept, and Least Squares, Exams of Statistics

An in-depth exploration of regression lines, focusing on predictions for vertical strips, slope and intercept calculations, and the concept of least squares. It includes examples and formulas for obtaining the slope and intercept of a regression line.

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-rlv
koofers-user-rlv 🇺🇸

9 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Regression Line
In this class we will try to get a deeper understanding of the
regression method. We will explore the geometry of the problem by
defining a regression line. We will cover the following topics:
Predictions on a vertical strip
Slope and intercept of the regression line
Least squares
AMS-5: Statistics
334
Predictions for data in a vertical strip
Example: A law school finds the following relationship between
LSAT sores and first-year scores
average LSAT score = 162, SD = 6
average first-year score = 68, SD = 10, r=0.60
Q: About what percentage of the students had first-year scores over
75?
A: We use the normal curve approximation. Converting to
standard units 75 68
10 = 0.7
this corresponds to a right hand tail of 14% under the normal curve.
Q: Of the students who scored 165 on the LSAT, about what
percentage had first-year scores over 75?
AMS-5: Statistics
335
A: We first convert to standard units for the xvariable:
165 162
6=.5
then convert to standard units for the yvariable
r×0.5 = 0.6×0.5 = 0.3
which corresponds to 0.3×10 = 3 points above average, or 68+3 =
71.
Since the data corresponding to a strip are a smaller and more
homogeneous sample, the corresponding SD will be smaller.
How much smaller?
We expect the dispersion in the yvariable to be about the same for
each vertical strip. This is given by the RMS error, thus the new
SD is
p1r2×SD of y=p10.62×10 = 8 points
AMS-5: Statistics
336
This new SD can be used to convert to standard units
75 71
8= 0.5
and, using the normal curve, we obtain an area of 31% above 0.5.
This is the percentage of students scoring more than 75 in the first
year among those who scored 165 in the LSAT.
Notice that this percentage is higher than the 14% we obtained
before. This is because we have focus on a smaller portion of the
sample, obtaining a smaller SD.
AMS-5: Statistics
337
pf3
pf4

Partial preview of the text

Download Understanding Regression Lines: Predictions, Slope, Intercept, and Least Squares and more Exams Statistics in PDF only on Docsity!

The Regression Line

defining a regression line. We will cover the following topics:regression method. We will explore the geometry of the problem by In this class we will try to get a deeper understanding of the

Predictions on a vertical strip

Slope and intercept of the regression line

Least squares

334

Predictions for data in a vertical strip

Example:

A law school finds the following relationship between

LSAT sores and first-year scores

average LSAT score = 162, SD = 6

average first-year score = 68, SD = 10,

r

=0.

Q:

About what percentage of the students had first-year scores over

A: 75?

We use the normal curve approximation. Converting to

standard units

Q: this corresponds to a right hand tail of 14% under the normal curve.

Of the students who scored 165 on the LSAT, about what

percentage had first-year scores over 75?

AMS-5: Statistics

A:

We first convert to standard units for the

x

variable:

then convert to standard units for the

y

variable

r

×

×

which corresponds to 0

×

10 = 3 points above average, or 68+3 =

We expect the dispersion in the How much smaller?homogeneous sample, the corresponding SD will be smaller.Since the data corresponding to a strip are a smaller and more71.

y

variable to be about the same for

SD iseach vertical strip. This is given by the RMS error, thus the new

r

2

×

SD of

y = √ 1 − 0

2

×

10 = 8 points

336

This new SD can be used to convert to standard units

sample, obtaining a smaller SD.before. This is because we have focus on a smaller portion of theNotice that this percentage is higher than the 14% we obtainedyear among those who scored 165 in the LSAT.This is the percentage of students scoring more than 75 in the firstand, using the normal curve, we obtain an area of 31% above 0.5.

AMS-5: Statistics

In summary, when considering data for a vertical strip:

Convert to standard units in the

x

variable.

Obtain the predicted value of the

y

variable.

Calculate the SD for the

y

variable in the strip using RMS

error.

Convert to standard units in the

y

variable and use the

normal curve.

338

Slope and intercept

All lines can be determined by a

slope and an intercept.

x

intercept y

Run

Rise

Positive slope

x

y

intercept

Run

Rise

Negative slope

slope =

runrise

The intercept is the height of the line when

x

The slope is the rate at which

y

increases, per unit increase in

x

. If

the slope is negative then

y

decreases as

x

increases.

AMS-5: Statistics

Example How do you get the slope of a regression line?

A sample of 555 California men age 25-29 in 1993 was

summarized bysurveyed to find out about education and income. The data are

average education

5 years

SD

4 years

average income

SD

, r

an increase ofThis means that, for every increase of one SD in education, there is

r

SD in income.

0 Thus, 4 extra years of education are worth an extra

. 35

×

(^) 600 of income. So, each extra year is worth 0

. 35

×

The intercept of the regression line is given by the value ofthis, is the slope of the regression line.

y

when

340

x

= 0. This is 12.5 years below average in education. Since each

which is below average byyear costs $1,400, a man with no education should have an income

5 years

×

400 per year = $

This is the intercept of the regression line.education is $21,500 -$17,500 = $4,000.since the average income is $21,500, the income of a man with no The formula for the slope of a regression line is

r

×

SD of

y

SD of

x

This corresponds to the change in

y

associated with one unit

increase in

x

.

AMS-5: Statistics

Least squares

scatterdiagram of observations corresponding to two variables Consider a cloud of points produced by obtaining the

x

and

y

. There are many lines that we can draw through the cloud.

The regression line is a possible solution to this problem.Which is the straight line that fits the points best? the one that has the smallest RMS error in predicting Among all possible lines through a cloud, the regression line is

y

from

x

.

This is the reason why the regression line is called the

least squares

line

Example

Let

b

be the length of a spring with no load. If a load

x

is

attached to the spring the stretch is proportional to

x

. Thus the

length of the string is

y

mx

b

346

where

m

and

b

are constants that depend on the string.

the data are shown in the table.An experiment is run to determine the constants for a given spring,

weight (kg)

length (cm)

The correlation coefficient is

r

999, so the points are very close

The regression line for these data produces estimates ofprobably due to measurement error.to straight line. But they are not exactly on a straight line. This is

b

and

m

values aregiven, respectively, by the intercept and the slope of the line. The

m

5c per kg, and

b

01 cm

These are the

least squares estimates

of

m

and

b

.

AMS-5: Statistics

Problem

midterm score, based on the following information: Find the regression equation for predicting final score from

average midterm score = 70, SD = 10

average final score = 55, SD = 20 ,

r

The slope of the line can be obtained as

r

×

SD of final

SD of midterm

×

A score of 0 in the midterm will correspond to a final score that is

×

units below average. So the intercept is 55

29 units of the

final score. Thus, the regression equation is

final score = 1

×

midterm score

348