Least-Squares Regression: Finding the Best Fit Line and Assessing Its Fit, Exams of Descriptive statistics

An overview of least-squares regression, including the concept of a regression line, its equation, and the method for finding it. The document also covers the importance of assessing the fit of the line and discusses the use of residual plots for this purpose. part of a larger course on bivariate and multivariate data and distributions.

Typology: Exams

2021/2022

Uploaded on 09/27/2022

daryth
daryth 🇺🇸

4.5

(2)

232 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 23:
Least-Square Regression Line;
Residual Plot
Chapter 3: Bivariate, Multivariate
Data and Distributions
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Least-Squares Regression: Finding the Best Fit Line and Assessing Its Fit and more Exams Descriptive statistics in PDF only on Docsity!

Lecture 23:

Least-Square Regression Line;

Residual Plot

Chapter 3: Bivariate, Multivariate

Data and Distributions

Review

  1. Scatter plots –Used to plot sample data points for bivariate data (x, y) –Plot the ( x,y) pairs directly on a rectangular coordinate –Qualitative visual representation of the relationships between the two variables
    • no precise statement can be made
  2. Pearson’s (linear) Correlation Coefficient, r,
    • measures the direction (+ or -) and strength of linear relationship between x and y observations.
      • Properties -- Concerns and Cautions about r.
  3. Association does not imply Causation…

Example

40 45 50 55 60 65 70 155 160 165 170 175 180

Least Squares Regression Line

  • Regression line is:
    • How do we know this is the right line?
    • What makes it best?
  • The line above is the Least Squares Regression Line - It is the line which makes the vertical distances from the data points to the line as small as possible - Uses the concept of sums of squares - Small sums of squares is good à Least Squares! - See previous slide.

y ˆ^ = − 61. 53 + 0. 696 x

Alternate calculations

  • Understanding the regression line:
  • Meaning of slope
    • If I change X by 1 unit, how much does Y change?
    • “Rise over run” concept
    • Directly related to the correlation
  • Meaning of intercept
    • Mostly of the time we don’t care
      • It’s simply a feature of the line
    • Sometimes has meaning
      • What should Y be if X=
      • Want to use line for prediction ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = x y s s b r a = ybx

Assessing the fit

  • How effectively does the least squares line

summarize the relationship between X and

Y? OR how well does the line fit the data?

  • We assess the fit of the line by looking at

how much variation in Y is explained by

the regression line on X

  • Again this is the concept of sums of squares

Coefficient of Determination

  • r 2 is given by:
  • Notice when SSR is large (or SSE small), we have explained a large amount of the variation in Y
  • Multiplying r 2 by 100 gives the percent of variation attributed to the linear regression between Y and X - Percent of variation explained!
  • Also, just the square of the correlation!

SST

SSE

SST

SSR

r = = 1 − 2

Standard Deviation LS line

  • Mean Squared Error about the LS line (with sample size = n):
  • Standard Deviation about the LS line: n – 2 comes from the degrees of freedom.
  • It is the typical amount by which an observation varies about the regression line
  • Also called “root MSE” or the square root of the Mean Square Error

Example—Height and Weight

Example—Height and Weight

Another way to assess

appropriateness: Residual Plots

  • The residuals can be used to assess the appropriateness of a linear regression model. A residual of is given as
  • Specifically, a residual plot , plotting the residuals against x , gives a good indication of whether the model is working - The residual plot should not have any pattern but should show a random scattering of points - If a pattern is observed, the linear regression model is probably not appropriate. ˆ i i i e = yy i y

Examples—good

Examples—

constant variance violation

If Bad residual plots… try

transformations!

  • Can think of transformations as simple mathematical manipulations. - Suppose x doesn’t predict y well but x is a very good predictor of log y. - Before we ever start, let new_y = log y! Then just fit a linear regression between x and new_y!!! - Is it still linear? - Yes! x and new_y should have a nice linear relationship - No! If I want to describe the “real” relationship between x and y I need to “undo” the transformation. Meaning explain it as a logarithm.
  • Also see power transformations in Section 3. (self-reading, not covered in exams)