Correlation and Regression in Elementary Statistics - Assignment | MATH 0013, Assignments of Statistics

Material Type: Assignment; Professor: Burke; Class: Elementary Statistics; Subject: Mathematics; University: Sierra College; Term: Spring 2009;

Typology: Assignments

Pre 2010

Uploaded on 07/30/2009

koofers-user-f5x
koofers-user-f5x 🇺🇸

9 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Sierra College – Math 13
Spring 2009 – Class 31/34
Today: Sections 10-1/10-3
Assignment: 10-2 {1, 3, 5, 7, 9, 13, 17, 19, 23}
10-3 {1, 3, 5, 7, 13, 15, 17, 19, 21, 23}
Next: Sections 13-1; 13-6; 15-3; Work on Project
Instructor: John Burke
Web Page: http://math.sierracollege.edu/Staff/JohnBurke/
Telephone: 916 337-0425
Office hours: (V-307) MW 2:35-5:00; M 2:45-3:45 (official)
2
10-1 / 10-3 Correlation and
Regression
In Chapter 10, we examine relationships between paired
quantitative data.
We use collected data to
Observe a pattern (correlation – 10-2)
Mathem atically model the pattern (regression – 10-3)
When appropriate, use the mathematical model to
make predictions.
3
Chapter 10 Problem:
Can We Predict the Time of the Next Eruption of Old Faithful?
Is there a relationship between any two variables?
Can we predict how long it will be to the next eruption
based upon duration, interval before, or height?
150125120140120125110140Height (L4)*
87101948394726592Interval After Eruption (L3)*
108811059398929098Interval Before Eruption (L2)*
220255269235234178120240Duration (L1)*
Eruptions of the Old Faithful Geyser
* Enter the data in your calculator/StatDisk
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Correlation and Regression in Elementary Statistics - Assignment | MATH 0013 and more Assignments Statistics in PDF only on Docsity!

1

Sierra College – Math 13

Spring 2009 – Class 31/

Today: Sections 10-1/10- Assignment: 10-2 {1, 3, 5, 7, 9, 13, 17, 19, 23} 10-3 {1, 3, 5, 7, 13, 15, 17, 19, 21, 23} Next: Sections 13-1; 13-6; 15-3; Work on Project

Instructor: John Burke E-mail: [email protected] Web Page: http://math.sierracollege.edu/Staff/JohnBurke/ Telephone: 916 337- Office hours: (V-307) MW 2:35-5:00; M 2:45-3:45 (official)

2

10-1 / 10-3 Correlation and

Regression

In Chapter 10, we examine relationships between paired quantitative data.

We use collected data to

  • Observe a pattern (correlation – 10-2)
  • Mathematically model the pattern (regression – 10-3)
  • When appropriate, use the mathematical model to make predictions.

Chapter 10 Problem:

Can We Predict the Time of the Next Eruption of Old Faithful?

Is there a relationship between any two variables?

Can we predict how long it will be to the next eruption based upon duration, interval before, or height?

Height (L 4 )* 140 110 125 120 140 120 125 150

Interval After Eruption (L 3 )* 92 65 72 94 83 94 101 87

Interval Before Eruption (L 2 )* 98 90 92 98 93 105 81 108

Duration (L 1 )* 240 120 178 234 235 269 255 220

Eruptions of the Old Faithful Geyser

  • Enter the data in your calculator/StatDisk

4

10-2 Correlation

Paired sample data is sometimes called bivariate data.

A correlation exists between two variables when one of them is related to the other in some way.

We can often see if a relationship exists by using a scatterplot (or scatter diagram ), a graph in which the paired (x, y) sample data are plotted with each pair represented as a single point.

Assumptions : we will consider only linear relationships, which means that when graphed, the points approximate a straight line. (Recall slope and direction of line.)

5

Positive Linear Correlation

x x

y y y

x

(b) Strong positive

(c) Perfect positive

(a) Positive

Negative Linear Correlation

x x

y y y

x

(d) Negative (e) Strong negative

(f) Perfect negative

10

Properties of r

The value of r does not change if all values of either variable are converted to a different scale. The value of r is not affected by the choice of x or y. Interchange all x- and y- values and the value of r will not change. r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. r^2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

The value of r is always between -1 and +1 inclusive.

2 2 2 2

n xy x y r n x x n y y

11

Table A-

Interpreting r using Table A-6 :

If the absolute value of the computed value of r exceeds the value in Table A-6, conclude that there is a significant linear correlation.

Otherwise, there is not sufficient evidence to support the conclusion of a significant linear correlation.

4 (^56) (^78) 9 (^1011) (^1213) 14 (^1516) (^1718) 19 (^2025) (^3035) 40 (^4550) (^6070) 80 10090

n . .959. .875. . .765. .708. . .641. .606. . .561. .463. . .378. .330. . .269.

. .878. .754. . .632. .576. . .514. .482. . .444. .361. . .294. .254. . .207.

α = .05^ α^ =.

Common Errors Involving Correlation

Causation : It is wrong to conclude that correlation implies causality (Remember eating lobster and its “effect” on pregnancy).

Averages : Averages suppress individual variation and may inflate the correlation coefficient.

Linearity : There may be some relationship between x and y even when there is no significant linear correlation.

13

Correlation Template

# Pairs

x y xy x*^2 y^2

Sum Mean

2 2 2 2

n xy x y r n x x n y y

14

Formal Hypothesis Test

Let H 0 : ρ = 0; H 1 : ρ ≠ 0

Select a significance level α

Calculate r

The test statistic is r. Critical values are determined from Table A-

If |r| > the C.V., reject H 0 ; otherwise, fail to reject H 0

If H 0 is rejected, conclude that there is a significant linear correlation. If you fail to reject H 0 , then there is not sufficient evidence to conclude that there is a linear correlation.

Example : Old Faithful

Fail to reject ρ = 0

r = - 0.707 0 r^ = 0.707^1

Sample data: r = 0.

-

Reject ρ = 0

Reject ρ = 0

We conclude there is a significant positive correlation between the Interval After Eruption and the Duration of Eruption.

19

r = - 0.707 (^0) r = 0.707^1

Sample data: r = 0.

  • 1

the test statistic does fall within the critical region.

REJECT H 0 : ρ = 0 (no correlation) and conclude there is a

significant linear correlation between the weights of discarded plastic and household size.

Is there a significant linear correlation?

Fail to reject ρ = 0

Reject ρ = 0

Reject ρ = 0

20

10-3 Regression

Once we have found a linear correlation, our next task is to find the best mathematical model to use for prediction.

Linear correlation means the data approximates a straight line; hence we are looking for the straight line y = mx + b that most closely approximates the data.

Interval After Eruption (L 3 ) 92 65 72 94 83 94 101 87

Duration (L 1 ) 240 120 178 234 235 269 255 220

Eruptions of the Old Faithful Geyser

Regression Line with ScatterPLot

Interval After (L 3 ) vs. Duration (L 1 )

22

Regression Equation

Assumptions: We are investigating only linear relationships. The pairs of (x, y) data have a bivariate normal distribution.

Definitions: Given a collection of paired sample data, the regression equation y = b 0 + b 1 x algebraically describes the relationship between the two variables. The graph of the regression equation is called the regression line (or line of best fit, or least-squares line.)

23

Regression Equation

Notation: β 0 and β 1 are the population parameters with regression equation y = β 0 + β 1 x , and b 0 and b 1 are the sample statistics with regression equation y = b 0 + b 1 x.

(^1 2 )

n xy x y

b

n x x

b 0 = y − b x 1

Slope:

Y-Intercept:

Round each to three decimal places.

Procedure for Predicting

Start

Calculate the value of r and test the hypothesis that ρ = 0.

Is ρ = 0 Rejected?

Use the regression equation to make predictions

Given any value of one variable, the best predicted value is the sample mean.

Yes

No

28

0. 2

1. 3

2. 3

2. 6

2. 4

1. 2

0. 1

3. 5

Data from the Garbage Project x Plastic (lb) y Household

What is the best predicted size of a household

that discards 0.50 lb of plastic?

b 0 = 0.

b 1 = 1.

Using a calculator:

y = 0.549 + 1.48 (0.50)

y = 1.

A household that discards 0.50 lb of plastic has

approximately one person.

29

Definitions

Residual

for a sample of paired (x,y) data, the difference ( y - y) between an observed sample y-value and the value of y ( the value of y that is predicted by using the regression equation).

Least-Squares Property

A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.

^

Residuals and the

Least-Squares Property

^

Residuals and the

Least-Squares Property

x 1 2 4 5

y 4 24 8 32 y^ = 5 + 4 x

02

46

(^108)

1214

16

1820

22

2426

28

3032

1 2 3 4 5

x

y (^) Residual = 7

Residual = -5 Residual = -

Residual = 11

^