


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth exploration of regression lines, focusing on predictions for vertical strips, slope and intercept calculations, and the concept of least squares. It includes examples and formulas for obtaining the slope and intercept of a regression line.
Typology: Exams
1 / 4
This page cannot be seen from the preview
Don't miss anything!



The Regression Line
defining a regression line. We will cover the following topics:regression method. We will explore the geometry of the problem by In this class we will try to get a deeper understanding of the
Predictions on a vertical strip
Slope and intercept of the regression line
Least squares
334
Predictions for data in a vertical strip
Example:
A law school finds the following relationship between
LSAT sores and first-year scores
average LSAT score = 162, SD = 6
average first-year score = 68, SD = 10,
r
=0.
About what percentage of the students had first-year scores over
We use the normal curve approximation. Converting to
standard units
Q: this corresponds to a right hand tail of 14% under the normal curve.
Of the students who scored 165 on the LSAT, about what
percentage had first-year scores over 75?
AMS-5: Statistics
We first convert to standard units for the
x
variable:
then convert to standard units for the
y
variable
r
which corresponds to 0
10 = 3 points above average, or 68+3 =
We expect the dispersion in the How much smaller?homogeneous sample, the corresponding SD will be smaller.Since the data corresponding to a strip are a smaller and more71.
y
variable to be about the same for
SD iseach vertical strip. This is given by the RMS error, thus the new
r
2
SD of
y = √ 1 − 0
2
10 = 8 points
336
This new SD can be used to convert to standard units
sample, obtaining a smaller SD.before. This is because we have focus on a smaller portion of theNotice that this percentage is higher than the 14% we obtainedyear among those who scored 165 in the LSAT.This is the percentage of students scoring more than 75 in the firstand, using the normal curve, we obtain an area of 31% above 0.5.
AMS-5: Statistics
In summary, when considering data for a vertical strip:
Convert to standard units in the
x
variable.
Obtain the predicted value of the
y
variable.
Calculate the SD for the
y
variable in the strip using RMS
error.
Convert to standard units in the
y
variable and use the
normal curve.
338
Slope and intercept
All lines can be determined by a
slope and an intercept.
x
intercept y
Run
Rise
Positive slope
x
y
intercept
Run
Rise
Negative slope
slope =
runrise
The intercept is the height of the line when
x
The slope is the rate at which
y
increases, per unit increase in
x
. If
the slope is negative then
y
decreases as
x
increases.
AMS-5: Statistics
Example How do you get the slope of a regression line?
A sample of 555 California men age 25-29 in 1993 was
summarized bysurveyed to find out about education and income. The data are
average education
5 years
4 years
average income
, r
an increase ofThis means that, for every increase of one SD in education, there is
r
SD in income.
0 Thus, 4 extra years of education are worth an extra
. 35
(^) 600 of income. So, each extra year is worth 0
. 35
The intercept of the regression line is given by the value ofthis, is the slope of the regression line.
y
when
340
x
= 0. This is 12.5 years below average in education. Since each
which is below average byyear costs $1,400, a man with no education should have an income
5 years
400 per year = $
This is the intercept of the regression line.education is $21,500 -$17,500 = $4,000.since the average income is $21,500, the income of a man with no The formula for the slope of a regression line is
r
SD of
y
SD of
x
This corresponds to the change in
y
associated with one unit
increase in
x
.
AMS-5: Statistics
Least squares
scatterdiagram of observations corresponding to two variables Consider a cloud of points produced by obtaining the
x
and
y
. There are many lines that we can draw through the cloud.
The regression line is a possible solution to this problem.Which is the straight line that fits the points best? the one that has the smallest RMS error in predicting Among all possible lines through a cloud, the regression line is
y
from
x
.
This is the reason why the regression line is called the
least squares
line
Example
Let
b
be the length of a spring with no load. If a load
x
is
attached to the spring the stretch is proportional to
x
. Thus the
length of the string is
y
mx
b
346
where
m
and
b
are constants that depend on the string.
the data are shown in the table.An experiment is run to determine the constants for a given spring,
weight (kg)
length (cm)
The correlation coefficient is
r
999, so the points are very close
The regression line for these data produces estimates ofprobably due to measurement error.to straight line. But they are not exactly on a straight line. This is
b
and
m
values aregiven, respectively, by the intercept and the slope of the line. The
m
5c per kg, and
b
01 cm
These are the
least squares estimates
of
m
and
b
.
AMS-5: Statistics
Problem
midterm score, based on the following information: Find the regression equation for predicting final score from
average midterm score = 70, SD = 10
average final score = 55, SD = 20 ,
r
The slope of the line can be obtained as
r
SD of final
SD of midterm
A score of 0 in the midterm will correspond to a final score that is
units below average. So the intercept is 55
29 units of the
final score. Thus, the regression equation is
final score = 1
midterm score
348