Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding the Relationship Between X and Y through Regression Analysis, Study notes of Social Statistics and Data Analysis

Minnesota State University (MSU) - Mankato Social Statistics and Data Analysis

The concept of regression analysis, focusing on the use of scatterplots and linear regression to identify the relationship between two variables, x and y. It covers the calculation of regression lines, prediction equations, and error terms, as well as the importance of the least squares criterion in determining the 'best fit' line. The document also discusses the concept of r-square and its role in measuring the proportion of variance in y that can be explained by its linear relationship with x.

Typology: Study notes

2011/2012

Uploaded on 01/23/2012

desmond 🇺🇸

4.8

(12)

327 documents

1 / 46

This page cannot be seen from the preview

Don't miss anything!

Chapter 6

Bivariate Correlation & Regression

6.1 Scatterplots and Regression Lines

6.2 Estimating a Linear Regression Equation

6.3 R-Square and Correlation

6.4 Significance Tests for Regression Parameters

Discover Study notes of Social Statistics and Data Analysis Minnesota State University (MSU) - Mankato

Partial preview of the text

Download Understanding the Relationship Between X and Y through Regression Analysis and more Study notes Social Statistics and Data Analysis in PDF only on Docsity!

Chapter 6 Bivariate Correlation & Regression

6.1 Scatterplots and Regression Lines

6.2 Estimating a Linear Regression Equation

6.3 R-Square and Correlation

6.4 Significance Tests for Regression Parameters

Scatterplot: a positive relation

Visually display relation of two variables on X-Y coordinates

50 U.S. States Y = per capita income X = % adults with BA degree

Positive relation: increasing X related to higher values of Y

Summarize scatter by regression line

Use linear regression to estimate “best-fit” line thru points:

How can we use sample data on the Y & X variables to estimate population parameters for the best-fitting line?

Slopes and intercepts

We learned in algebra that a line is uniquely located in a coordinate system by specifying: (1) its slope (“rise over run”); and (2) its intercept (where it crosses the Y-axis)

Equation has a bivariate linear relationship:

Y = a + bX

where:

b is slope

a is intercept

DRAW THESE 2 LINES:

0 1 2 3 4 5 6

6 5 4 3 2 1 0 Y = 0 + 2 X

Y = 3 - 0.5 X

Regression error

The regression error, or residual, for the ith case is the difference between the value of the dependent variable predicted by a regression equation and the observed value of that case. Subtract the prediction equation from the linear regression model to identify the ith case’s error term

Yi  a  bYXXi  ei

Yi a bYXXi  ˆ^   

i Yi ei Y  ˆ 

An analogy: In weather forecasting, an error is the difference between the weatherperson’s predicted high temperature for today and the actual high temperature observed today: Observed temp 86º - Predicted temp 91º = Error -5º

The Least Squares criterion

Scatterplot for state Income & Education has a positive slope

Ordinary least squares (OLS) a method for estimating regression equation coefficients -- intercept (a) and slope (b) -- that minimize the sum of squared errors

To plot the regression line, we apply a criterion yielding the “best fit” of a line through the cloud of points

OLS estimator of the intercept, a

The OLS estimator for the intercept (a) simply changes the mean of Y (the dependent variable) by an amount equaling the regression slope’s effect for the mean of X:

a  Y  bX

Two important facts arise from this relation:

(1) The regression line always goes through the point of both variables’ means!

(2) When the regression slope is zero, for every X we only predict that Y equals the intercept a , which is also the mean of the dependent variable!

bYX  0

a  Y

Use these two bivariate regression equations, estimated from the 50 States data, to calculate some predicted values:

Y ˆ i  a  bYXXi

Regress income on bachelor’s degree:

Y ˆ^ i  $ 9. 9  0. 77 Xi^ What predicted incomes for:

Xi = 12%: Y=____________ Xi = 28%: Y=____________

Regress poverty percent on female labor force pct:

What predicted poverty % for: Xi = 55%: Y=____________ Xi = 70%: Y=____________

Y ˆ^ i  45. 2 % 0. 53 Xi

Errors in regression prediction

Every regression line through a scatterplot also passes

through the means of both variables; i.e., point ( Y , X )

We can use this relationship to divide the variance of Y into a double deviation from:

(1) the regression line (2) the Y-mean line Then calculate a sum of squares that reveals how strongly Y is predicted by X.

Illinois double deviation

In Income-Education scatterplot, show the difference between the mean and Illinois’ Y-score as the sum of two deviations:

Yi Yi   ˆ

 Y ˆ i  Y

Error deviation of observed and predicted scores

Regression deviation of predicted score Y from the mean

Naming the sums of squares

 ^      ( Y Y )^2 ( Y Y ˆ )^2 ( Y ˆ Y )^2 i i i i

Each result of the preceding partition has a name:

TOTAL sum of squares

REGRESSION sum of squares

ERROR sum of squares

SSTOTAL = SSERROR + SSREGRESSION

The relative proportions of the two terms on the right indicate how well or poorly we can predict the variance in Y from its linear relationship with X

The SSTOTAL should be familiar to you – it’s the numerator of the variance of Y (see the Notes for Chapter 2). When we partition the sum of squares into the two components, we’re analyzing the variance of the dependent variable in a regression equation.

Hence, this method is called the analysis of variance or ANOVA.

Coefficient of Determination

If we had no knowledge about the regression slope (i.e., bYX = 0 and thus SSREGRESSION = 0), then our only prediction is that the score of Y for every case equals the mean (which also equals the equation’s intercept a ; see slide #10 above).

But, if bYX ≠ 0, then we can use information about the i th case’s score on X to improve our predicted Y for case i. We’ll still make errors, but the stronger the Y-X linear relationship, the more accurate our predictions will be.

Y a

Y a X

Y a b X

i i

i YX i

Find the R^2 for these 50-States bivariate regression equations

R-square for regression of income on education

SSREGRESSION = 409. SSERROR = 342. SSTOTAL = 751.

R^2 = _________

R-square for poverty-female labor force equation

SSREGRESSION = ______ SSERROR = 321. SSTOTAL = 576.

R^2 = _________

Here are some R^2 problems from the 2008 GSS

R-square for church attendance regressed on age

SSREGRESSION = 67, SSERROR = 2,861, SSTOTAL = _________

R^2 = _________

R-square for sex frequency-age equation

SSREGRESSION = 1,511, SSERROR = _____________ SSTOTAL = 10,502,

R^2 = _________

Understanding the Relationship Between X and Y through Regression Analysis, Study notes of Social Statistics and Data Analysis

Related documents

Partial preview of the text

Download Understanding the Relationship Between X and Y through Regression Analysis and more Study notes Social Statistics and Data Analysis in PDF only on Docsity!

Chapter 6

Bivariate Correlation & Regression

6.1 Scatterplots and Regression Lines

6.2 Estimating a Linear Regression Equation

6.3 R-Square and Correlation

6.4 Significance Tests for Regression Parameters

Scatterplot: a positive relation

Summarize scatter by regression line

Slopes and intercepts

Regression error

The Least Squares criterion

bYX  0

Y ˆ i  a  bYXXi

Y ˆ^ i  $ 9. 9  0. 77 Xi^ What predicted incomes for:

Y ˆ^ i  45. 2 % 0. 53 Xi

Errors in regression prediction

through the means of both variables; i.e., point ( Y , X )

Illinois double deviation

Naming the sums of squares

Coefficient of Determination

Y a

Y a X

Y a b X