Class Note - Bivariate Data, Linear regression, and Correlation | STAT 100, Study notes of Probability and Statistics

Material Type: Notes; Class: ELEM STAT & PROB; Subject: Statistics and Probability; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-q2u-1
koofers-user-q2u-1 🇺🇸

5

(1)

10 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 100 Lecture 4:
Bivariate Data, Scatter Plots, Linear Regression,
and Correlations
Nate Strawn
http://www.math.umd.edu/nstrawn/
Nate Strawn STAT 100 Lecture 4: Bivariate Data, Scatter Plots, Linear Regression, and Correlations
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Class Note - Bivariate Data, Linear regression, and Correlation | STAT 100 and more Study notes Probability and Statistics in PDF only on Docsity!

STAT 100 Lecture 4:

Bivariate Data, Scatter Plots, Linear Regression,

and Correlations

Nate Strawn

http://www.math.umd.edu/∼nstrawn/

Last Time...

(^1) Lots of definitions

The Variance: s^2 = 1 N − 1

∑^ N i=

(xi − x)^2

The Standard Deviation: s =

√√ √√ 1 N − 1

∑^ N i=

(xi − x)^2

The Range: Range = 1 max≤i≤N xi − (^1) ≤mini≤N xi The Interquartile Range: IQR = Q 3 − Q 1.

(^2) Boxplots and Normal Distributions

Bivariate Data

Bivariate Data is a fancy way to say, “Two-Variable Data.” Multivariate Data is a fancy way to say, “Many-Variable Data.” Bivariate Data is a list of numerical pairs

(x 1 , y 1 ), (x 2 , y 2 ),... , (xN , yN ).

The easiest way to visualize Bivariate Data is through a Scatter Plot.

Scatter Plots

(^014) 14.2 14.4 14.6 14.8 15 15.2 15.4 15.6 15.8 16

5000 10000 15000 20000 25000 30000 35000 40000 45000

50000 Average Global Temperature vs Number of Pirates

Average Global Temperature

Number of Pirates

(^00 50 100 150 200 250 300 350 400 450 )

50

100

150

200

250 Degrees Fahrenheit vs Degrees Celsius

Degrees Fahrenheit

Degrees Celsius

(^015 10 5 0 5 10 )

20

40

60

80

100

120 Integers vs Squares

x

x^

(^00) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Stuff vs Junk

Stuff

Junk

Formula for the Correlation Coefficient

Formula

r = Sxy √ Sxx

Syy where Sxy =

(xi − x)(yi − y ), Sxx =

(xi − x)^2 , and Syy =

(yi − y )^2.

Mean Temp. Pirates xi − x yi − y (xi − x)^2 (yi − y )^2 (xi − x)(yi − y ) (x) (y ) 14.2 35000 -0.76 17797.57 0.57 316753548 - 14.3 45000 -0.66 27797.57 0.43 772704977 - 14.6 20000 -0.36 2797.57 0.13 7826405 - 14.9 15000 -0.06 -2202.43 0 4850691 125 15.2 5000 0.24 -12202.43 0.06 148899263 - 15.6 400 0.64 -16802.43 0.41 282321605 - 15.9 17 0.94 -17185.43 0.89 295338955 - Tot.=104.7 120417 0 0 2.5 1828695447 - x = 14. 96 y = 17202. 43 Sxx Syy Sxy

We then have that r = Sxy /(

Sxx

Syy ) ≈ − 0. 93

Scatter Plots and r-Values

For the following scatter plots, we have r = − 0 .93 and r = 1.0 for the first row, and r = − 0 .34 and r = 0.35 for the second row.

(^014) 14.2 14.4 14.6 14.8 15 15.2 15.4 15.6 15.8 16

(^100005000) 1500020000

2500030000

3500040000

4500050000

Average Global Temperature vs Number of Pirates

Average Global Temperature

Number of Pirates (^00 50 100 150 200 250 300 350 400 450 )

50

100

150

200

250 Degrees Fahrenheit vs Degrees Celsius

Degrees Fahrenheit

Degrees Celsius

(^015 10 5 0 5 10 )

20

40

60

80

100

120 Integers vs Squares

x

x^

(^00) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1

Stuff vs Junk

Stuff

Junk

Quick Review of Linear Equations

Remeber that a line is both an equation (y = β 0 + β 1 x) and a graph:

Here, β 0 is the y -intercept, and β 1 is the slope of the line.

Linear Regression

Definition The Least Squares Line fitting the data

(x 1 , y 1 ), (x 2 , y 2 ),... , (xN , yN )

is ˆy = ˆβ 0 + ˆβ 1 x, where β^ ˆ 1 = Sxy /Sxx =

(xi − x)(yi − y ) ∑ (xi − x)^2 and βˆ 0 = y − βˆ 1 x.

For Next Time

Quiz 01! Covers Sections 2.1, 2.2, 2.3, 2.4, and 2.5. Problems are taken directly from the book (with small changes!).

Read Section 3.4, 3.5, and 3.6 from Johnson and Bhattacharyya

Online homework 3.6: 3.35, 3.37, 3.39, 3.41, 3.

Group Problems: Group 1 2 3 4 5 6 7 8 9 10 Problem 3.38 3.40 3.42 3.46 3.49 3.51 3.52 3.53 3.54 3.