Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Regression Analysis: Lecture File 4 by N. Christopher Phillips - Prof. N. Phillips, Study notes of Probability and Statistics

University of Oregon (UO)Probability and Statistics

Prof. N. Phillips

A lecture file from math 243, where n. Christopher phillips explains regression analysis through examples and calculations. Topics such as regression lines, correlation, and the importance of understanding outliers and lurking variables.

Typology: Study notes

Pre 2010

Uploaded on 07/29/2009

koofers-user-ok3 🇺🇸

9 documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

Math 243: Lecture File 4

N. Christopher Phillips

9 April 2009

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 1 / 61

Regression line: Example 2

The regression line depends on which variable is the explanatory variable.

See Example 5.3 in the book for an example with real data. Here is a more

dramatic example with fictitious data.

Data: (2,4),(5,10),(8,4).(Again, just three points.)

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 2 / 61

Example 2 (continued)

Data: (2,4),(5,10),(8,4).

The correlation is r= 0.

So the regression line has slope zero, and turns out to be

y= 6.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 3 / 61

Example 2 (continued)

Exchange the explanatory and response variables.

The data was: (2,4),(5,10),(8,4).

It is now: (4,2),(10,5),(4,8).

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 4 / 61

Discover Study notes of Probability and Statistics University of Oregon (UO)

Partial preview of the text

Download Regression Analysis: Lecture File 4 by N. Christopher Phillips - Prof. N. Phillips and more Study notes Probability and Statistics in PDF only on Docsity!

Math 243: Lecture File 4

N. Christopher Phillips

9 April 2009

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 1 / 61

Regression line: Example 2

The regression line depends on which variable is the explanatory variable. See Example 5.3 in the book for an example with real data. Here is a more dramatic example with fictitious data.

Data: (2, 4), (5, 10), (8, 4). (Again, just three points.)

0 2 4 6 8 10

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 2 / 61

Example 2 (continued)

Data: (2, 4), (5, 10), (8, 4). The correlation is r = 0. So the regression line has slope zero, and turns out to be

̂ y = 6.

0 2 4 6 8 10

Example 2 (continued)

Exchange the explanatory and response variables. The data was: (2, 4), (5, 10), (8, 4). It is now: (4, 2), (10, 5), (4, 8).

0 2 4 6 8 10 12

Example 2 (continued)

Data with explanatory and response variables switched: (4, 2), (10, 5), (4, 8). The correlation is still r = 0. So the regression line again has slope zero, and turns out to be ̂ y = 5.

0 2 4 6 8 10 12

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 5 / 61

Example 2 (continued)

Data: (2, 4), (5, 10), (8, 4).

Both regression lines on the same plot, with the original choice of explanatory variable:

0 2 4 6 8 10

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 6 / 61

Some good and bad points about the least squares

regression line

It is easy to calculate, compared to other ways of trying to fit a line to data. It is suitable for data whose deviations from lying on a straight line are likely to be approximately normally distributed. It is not suitable for nonlinear relationships. It has the same problems that the mean and standard deviation do. For example, it is not resistant.

Regression lines for previously displayed scatterplots

Here are the scatterplots from the lecture of 7 April, with regression lines added. (Example 9, with its poor choice of scale on the vertical axis, has been omitted.)

Observe that sometimes the line has little to do with the pattern in the scatterplot, and other times it is closely related.

Example 1.

2 4 6 8 10 12 14

Clear positive association, roughly linear. Regression line: ̂y ≈ 1 .63308 + 0. 789497 x. Correlation r ≈ 0. 915883 r 2 ≈ 0 .838842: The change in x explains about 84% of the change in y. N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 13 / 61

Example 2.

2 4 6 8 10 12 14

Clear positive association, roughly linear, one outlier. (The other points are as in Example 1.)

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 14 / 61

Example 2 (continued).

Correlation r ≈ 0. 671778. (It was r ≈ 0 .915883 without the outlier.)

r 2 ≈ 0 .451285: The change in x explains about 45% of the change in y. (Without the outlier, r 2 ≈ 0. 838842 .)

Regression line: ̂y ≈ 2 .57551 + 0. 595541 x. (It was ̂ y ≈ 1 .63308 + 0. 789497 x without the outlier.)

Comparing the graphs, one sees that the regression line did not move a great deal.

The outlier is influential for r and r 2 , but it is not influential for the regression line.

However, regression lines are not resistant: outliers in other locations can be influential.

Example 3.

2 4 6 8 10 12 14

Very strong positive association, linear. Regression line: ̂y ≈ 0 .425816 + 0. 937028 x. Correlation r ≈ 0. 991102

r 2 ≈ 0 .982284: The change in x explains about 98% of the change in y.

Example 4.

5 10 15 20

Weak positive association. Regression line: ̂y ≈ 5 .05521 + 0. 463938 x. Correlation r ≈ 0. 483459 r 2 ≈ 0 .233732: The change in x explains about 23% of the change in y.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 17 / 61

Example 5.

2 4 6 8 10 12 14

Strong negative association, roughly linear. Regression line: ̂y ≈ 16. 3048 − 1. 06155 x.

Correlation r ≈ − 0. 950719 r 2 ≈ 0 .903866: The change in x explains about 90% of the change in y. N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 18 / 61

Example 6.

2 4 6 8 10 12 14

Very strong negative association, linear. Regression line: ̂y ≈ 13. 5742 − 0. 937028 x. Correlation r ≈ − 0. 991102 r 2 ≈ 0 .982284: The change in x explains about 98% of the change in y.

Example 7.

2.5 5 7.5 10 12.5 15

Weak negative association. Regression line: ̂y ≈ 11. 553 − 0. 463938 x. Correlation r ≈ − 0. 483459 r 2 ≈ 0 .233732: The change in x explains about 23% of the change in y.

Example 12.

2 4 6 8 10 12 14

Strong nonlinear negative association. Regression line: ̂y ≈ 23. 2928 − 1. 25295 x.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 25 / 61

Example 12 (continued).

The regression line does not match the pattern very well. Correlation r ≈ − 0. 913696 r 2 ≈ 0 .834841: The change in x explains about 83% of the linear change in y.

r 2 understates the strength of the pattern, because the association is nonlinear.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 26 / 61

Example 13.

2 4 6 8 10 12 14

Very strong nonlinear negative association. Regression line: ̂y ≈ 23. 0764 − 1. 21755 x.

As in Example 12, the regression line does not match the pattern very well. The pattern is stronger, but r and r 2 are similar to that example.

Outliers and influential points

Which kinds of points are likely to be influential for the correlation? Which kinds of points are likely to be influential for the regression line?

Original data

r 2 ≈ 0. 623402.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 29 / 61

Original data, with more white space (to allow room to add outliers):

r 2 ≈ 0. 623402.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 30 / 61

Example 1.

Example 1 (continued).

r 2 ≈ 0 .61667: little change from the original value of 0. 623402.

Example 4.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 37 / 61

Example 4 (continued).

r 2 ≈ 0 .442955: somewhat less than the original value of 0. 623402.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 38 / 61

Example 5.

Example 5 (continued).

r 2 ≈ 0 .781681: considerably bigger than the original value of 0. 623402. The red point is influential for r 2.

Example 5 (continued).

r 2 ≈ 0 .781681: considerably bigger than the original value of 0. 623402. The red point is influential for r 2.

This is a lot to have rest on just one data point. See the discussion on pages 129–131 for a similar situation with real data. Warning: You should be suspicious of any outcome involving an influential point.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 41 / 61

Example 6.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 42 / 61

Example 6 (continued).

r 2 ≈ 0 .391031: considerably smaller than the original value of 0. 623402.

Example 7.

Residual plot for the previous example

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 49 / 61

Example whose residuals show no pattern.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 50 / 61

Residual plot for the previous example

Facts about the regression line (continued).

A change in one standard deviation in x gives a change of r standard deviations in ̂y. Thus, the closer the correlation is to zero, the less ̂y responds to changes in x. r 2 measures the success of the regression. (See earlier lectures.) The regression line goes through (x, y ) and has slope rsy /sx. (This was stated before.)

The regression line is not resistant. See examples above. Be very careful with extrapolation! (Examples below.) Caution: Lurking variables can spoil conclusions based on regression. (Examples below.) Caution: Association does not imply causation. (Examples below.)

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 53 / 61

Regression on the TI-83 calculator: See earlier notes.

Caution on notation: Our book uses

̂ y = a + bx,

as does its picture of the screen of the TI-83 calculator. My TI- calculator (several years old) has

ŷ = ax + b.

Make sure you pay attention to which coefficient is which.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 54 / 61

Hazard: extrapolation.

We find a regression line to predict the height of a child between 0 and 10 years old from the child’s age. Will the resulting prediction for the height of a 25 year old be reasonable? What about a 50 year old?

No. People stop growing in their teens.

A regression line for the best time in the 50 yard dash each year, with x being the year, predicts that eventually the winning time will be negative.

Hazard: Lurking variables.

A lurking variable is one which has an important effect on the relationships among the variables studied but which itself is not studied.

Example: Smoking causes lung cancer.

Observational studies show, for example, that people who smoke are more likely to get lung cancer, even that, say, 60–65 year old men who smoke are more likely to get lung cancer that 60–65 year old men who don’t smoke.

Experiments show that rats which smoke are more likely to get lung cancer that rats which don’t smoke. Here, the rats are divided in two groups, and rats in one group are subjected to cigarette smoke while the rats in the other group are not. We can’t do this with people.

See the end of Chapter 5 for a number of reasons which, taken together, show that the evidence that smoking causes lung cancer in people is very strong even without doing experiments on people.

N. Christopher Phillips () Math 243: Lecture File 4 9 April 2009 61 / 61

Regression Analysis: Lecture File 4 by N. Christopher Phillips - Prof. N. Phillips, Study notes of Probability and Statistics

Related documents

Partial preview of the text

Download Regression Analysis: Lecture File 4 by N. Christopher Phillips - Prof. N. Phillips and more Study notes Probability and Statistics in PDF only on Docsity!

Math 243: Lecture File 4

Regression line: Example 2

Example 2 (continued)

Example 2 (continued)

Example 2 (continued)

Example 2 (continued)

Some good and bad points about the least squares

regression line

Regression lines for previously displayed scatterplots

Example 1.

Example 2.

Example 2 (continued).

Example 3.

Example 4.

Example 5.

Example 6.

Example 7.

Example 12.

Example 12 (continued).

Example 13.

Outliers and influential points

Original data

Example 1.

Example 1 (continued).

Example 4.

Example 4 (continued).

Example 5.

Example 5 (continued).

Example 5 (continued).

Example 6.

Example 6 (continued).

Example 7.

Residual plot for the previous example

Example whose residuals show no pattern.

Residual plot for the previous example

Facts about the regression line (continued).

Hazard: extrapolation.

Hazard: Lurking variables.