Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Regression: Understanding Correlation and Predicting Values, Exams of Statistics

University of California-Santa Cruz Statistics

An introduction to regression analysis, focusing on simple linear regression. It covers the definition of regression, the calculation of the regression line, and the use of the normal curve to estimate percentile ranks. The document also explains how to predict individual values using the regression method and discusses the regression effect and errors.

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-6tj 🇺🇸

10 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

Regression

The idea behind the calculation of the coefficient of correlation is

that the scatterplot of the data corresponds to a cloud that follows

a straight line. This idea can be formalized by regression methods.

In this class we will:

•Consider the definition of simple linear regression

•Find a method to predict an individual value

•Use the normal curve to estimate the percentile ranks

•Describe the regression effect

•Compute the regression errors and its RMS

•Study the behavior of regression errors

AMS-5: Statistics

311

Regression

The regression method describes how one variable depends on

another.

2000 4000 6000 8000

60 65 70 75 80 85

feet

Degrees

August Temperatures vs Elevation in Northern California

The Northern California

temperature data have av-

erage altitude of 3,524 feet

and a SD of 1,839 feet; av-

erage temperature of 70.3

degrees and SD 6.5 de-

grees. The correlation be-

tween temperature and al-

titude is -0.76

AMS-5: Statistics

312

The cloud of points shows a mild negative association between the

two variables, as does the value of r. Can we use the values of

altitude to estimate the average values of temperature?

The regression line for yon xestimates the average value of y

corresponding to each value of x

AMS-5: Statistics

313

How does the regression line work?

x

yr x SDy

SDx

Associated with an increase of

one SD in xthere is an increase

of r×SDs in yon average.

Clearly, if the correlation coefficient is negative, then the average

value of ydecreases as xincreases.

In the temperature and altitude example, an increase of height of

1,839 feet produces a increase of −0.76 ×6.5 = −4.95 degrees in the

average temperature.

AMS-5: Statistics

314

Discover Exams of Statistics University of California-Santa Cruz

Partial preview of the text

Download Regression: Understanding Correlation and Predicting Values and more Exams Statistics in PDF only on Docsity!

Regression

In this class we will:a straight line. This idea can be formalized by regression methods.that the scatterplot of the data corresponds to a cloud that follows The idea behind the calculation of the coefficient of correlation is

Consider the definition of simple linear regression

Find a method to predict an individual value

Use the normal curve to estimate the percentile ranks

Describe the regression effect

Compute the regression errors and its RMS

Study the behavior of regression errors

311

Regression

another. The regression method describes how one variable depends on

2000

4000

6000

8000

60 65 70 75 80 85

feet

Degrees

August Temperatures vs Elevation in Northern California

The

Northern

California

degreeserage temperature of 70.3and a SD of 1,839 feet; av-erage altitude of 3,524 feettemperature data have av-

and

SD

de-

titude is -0.76tween temperature and al-grees. The correlation be-

AMS-5: Statistics

two variables, as does the value of The cloud of points shows a mild negative association between the

r

. Can we use the values of

altitude to estimate the average values of temperature? The regression line for

y

on

x

estimates the average value of

y

corresponding to each value of

x

313

How does the regression line work?

x

y

r x SDy

SDx

one SD in Associated with an increase of

x

there is an increase

of

r

×

SDs in

y

on average.

value ofClearly, if the correlation coefficient is negative, then the average

y

decreases

as

x

increases.

1,839 feet produces a increase ofIn the temperature and altitude example, an increase of height of

×

95 degrees in the

average temperature.

AMS-5: Statistics

If we consider two variablesHow do we use the method to predict an individual value?

x

and

y

and we want to predict the

value of

y

for a specific value of

x

, we use the average value of

y

that corresponds to the value of

x

according to the regression

Example:method.

The first year GPAs and the Math SAT for the students

of a university produce the following data

average SAT score = 550

SD

average 1st-year GPA = 2

SD

r

score of 650.We want to predict the 1st-year GPA of a student with a SAT

315

The student’s SAT score in standard units is

above the average SAT score produces an increase of 0so the score is 1.25 SDs above average. An increase of one SD

×

6 GPA

points. This implies that our student will have an increase of

×

predicted GPA ispoints of GPA above average. Since the average GPA is 2.6, the

scores around 650.This is the average GPA that we expect for students with STA

AMS-5: Statistics

WARNING:

You can use the regression method on new subjects

produce the averages, SDs andprovided that they are similar to the ones that were used to

r

used in the regression method.

of a different institution.In the previous example the method will not be valid for students

317

estimates of the We can use the regression method and the normal curve to produce

percentile ranks

Example:

In the previous example suppose a student has a

to Using the normal curve we have that a 90% probability correspondsfor the 1st-year GPA of this student?scores are higher than his. What is the predicted percentile rankpercentile rank of 90% for the SAT scores. That is, only 10% of the

z

score of 1.3. This means that the student’s SAT score is 1.

This corresponds to beingSDs above average.

×

5 SDs above the average GPA

normal curve, of approximately 69%.and this corresponds to an accumulated probability, under the

AMS-5: Statistics

40

60

80

100

120

140

160

0.000 0.005 0.010 0.015 0.020 0.

scores

density

is curve, the first possibility According to the normal

more

likely,

since

the

nario,below 140. Under this sce-probability than the oneterval above 140 has lessmean is 100 and so the in-

the second test is

value below 140.more likely to produce a

323

valid A symmetric situation is

for

those

scoring,

say,

IQ.

It

is

likely

the first.score is likely to be aboveerror, and so the second80 with a negative chancethat the true test is above

40

60

80

100

120

140

160

0.000 0.005 0.010 0.015 0.020 0.

scores

density

AMS-5: Statistics

This explains the regression effect.scoring below average in the first test.be lower than the first. A symmetric situation is true for a personstudent takes the test again, chances are that the second score willit is likely that the true score is lower than the observed one. If the In other words, if a students scores above average in the first test,

325

Regression errors

The regression method can be used to predict

y

from

x

. But actual

values differ from predictions. These are the

regression errors

error = actual value of

y

predicted value of

y

(RMS), as we did to obtain the SD. This is equal to The overall size of the error is measured using the root-mean-square How do we measure the error in a regression?some are below the regression line.negative. Reflecting the fact that some observations are above and Some of the errors defined in this way are positive and some are

(error 1)^

2

(error 2)

2

(error N)

2

N

where

N

is the number of points in the scatter diagram.

Since the RMS error is an average it reflects how far a typical point

AMS-5: Statistics

As a rule of thumb we have thatwhat the SD is to the average.So the RMS error is like a SD. Actually it is to the regression lineis from the regression line.

68% of the points on a scatter diagram fall inside the strip

whose edges are parallel to the regression line and

one

RMS

error away.

95% of the points on a scatter diagram fall inside the strip

whose edges are parallel to the regression line and

two

RMS

error away.

327

errorRMS One errorRMSOne

x

y

x

errorRMSTwo errorRMSTwo

68%

95%

What if we ignore the values of

x

?

Then our prediction for

y

is the average of

y

. In this case the RMS

error coincides with the SD of

y

.

AMS-5: Statistics

Computing the RMS error

values of We saw that the error that corresponds to a prediction where the

x

are ignored corresponds to the SD of

y

. The overall size

of the error for a regression using

x

has to be smaller than the SD.

How much smaller?

RMS error =

r

2

×

SD

of

y

We observe the following features

The units of the RMS error are the same as the units of the

variable being predicted.

Perfect correlation corresponds to zero RMS error.

Zero correlation corresponds to maximum RMS error (equal to

SD of

y

).

329

California temperature

In the California temperature example we

had that the SD of

y

is 6.5 degrees and the correlation is -0.76, then

2

×

5 degrees

22 degrees

Shoe sizes4.22 degrees.So, in this case, knowing the altitude reduces the SD from 6.5 to

In the shoe sizes examples we had that the SD of

y

is

2.45 inches and the correlation is 0.93, then

2

×

45 degrees

90 degrees

reduction of the SD from 2.45 to 0.90.So we observe that, knowing the shoe size produces a dramatic

AMS-5: Statistics

Regression: Understanding Correlation and Predicting Values, Exams of Statistics

Related documents

Partial preview of the text

Download Regression: Understanding Correlation and Predicting Values and more Exams Statistics in PDF only on Docsity!

SD

×

SD

SD

×

6 GPA

×

×

WARNING:

×

IQ.

N

N

RMS

RMS

×

SD

×

×