Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture Notes on Linear Regression Diagnostic and Variable Transformations | STAT 431, Study notes of Statistics

University of Pennsylvania (UPenn)Statistics

Material Type: Notes; Class: STATISTICAL INFERENCE; Subject: Statistics; University: University of Pennsylvania; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-a6p 🇺🇸

(1)

10 documents

1 / 22

This page cannot be seen from the preview

Don't miss anything!

Statistics 431:

Statistical Inference

Lecture 17: Linear regression diagnostics and

variable transformations

Discover Study notes of Statistics University of Pennsylvania (UPenn)

Partial preview of the text

Download Lecture Notes on Linear Regression Diagnostic and Variable Transformations | STAT 431 and more Study notes Statistics in PDF only on Docsity!

Statistics 431:

Statistical Inference

Lecture 17: Linear regression diagnostics and

variable transformations

Regression diagnostics

(^) The conditions required for inference from simple linear regression must be

checked:

(^) Linearity. Diagnostic: residual plot.
(^) Constant variance. Diagnostic: residual plot.
(^) Normality. Diagnostics:
- histogram of residuals.
- normal quantile plot of residuals.
(^) Independence. Diagnostic: residual plot.
(^) Outliers and influential points. Diagnostic: scatterplot, Cook’s distances.

Bivariate Fit of HousePrice By CrimeRate

HousePrice

CrimeRate

Summary of Fit

Residual

Distri

Resid

CrimeRate

Summary of Fit

RSquare 0.

RSquare Adj 0.

Root Mean Square Error 78861.

Mean of Response 158464.

Observations (or Sum Wgts) 98

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t|

Intercept 225233.55 16404.02 13.73 <.

CrimeRate -2288.689 491.5375 -4.66 <.

Residual plots

(^) A residual plot is a scatterplot of each residual ei vs its corresponding xi.
(^) A residual plot can suggest nonlinearity and/or non-constant variance, both

violations of the assumptions in the ideal linear model.

(^) If the ideal model holds, the residual plot should contain patternless scatter.
(^) The residuals should be centered around a horizontal line at 0. A nonlinear

regression function μ( x ) will lead to a violation of this constraint. For example, the residuals on the left and right might be generally above zero, and those in the middle generally below zero.

(^) The residuals should have the same spread around zero, at all x values. If,

for example, the residual plot has a “bulge” in the model, the constant-variance assumption has been violated.

(^) Look for pronounced patterns. No residual plot will satisfy you as being

“perfectly” patternless (not even when we do a regression on data simulated from the ideal linear model!).

100000

200000

300000

Residual

10 20 30 40 50 60 70 CrimeRate

Distributions

Residuals HousePrice

(^) Some evidence of upwards curvature in the average residuals at the low and high end of crime rate.

DisplayFeet

100

Residual

0 1 2 3 4 5 6 7 8

DisplayFeet

Residual plot shows mean residual less

1 and 7 display feet and greater than ze

(^) The residual plot shows negative mean residuals at 1 foot and 7 feet, and positive mean residuals elsewhere.
(^) What does this indicate?

Examining and Transforming Data Tukey’s bulging rule

Figure 15. Mosteller and Tukey’s bulging rule for selecting linearizing trans ormations.

Display effect: Transformations to

x

100

200

300

400

Sales

0 1 2 3 4 5 6 7 8 DisplayFeet

50 Residual

0 1 2 3 4 5 6 7 8 DisplayFeet

(^) Still looks like overestimation at 1 foot and underestimation at 2–3 feet.

Display effect: Transformations to log x

100

200

300

400

Sales

0 1 2 3 4 5 6 7 8 DisplayFeet

100

Residual

0 1 2 3 4 5 6 7 8 DisplayFeet

(^) This is a slight improvement.

Heteroscedasticity

(^) You may have already forgotten that the ideal linear model assumes all the

distrns Y | X = x have the same variance σ 2. This is homoscedasticity.

(^) When this assumption is violated, we have non-constant variances, or

heteroscedasticity.

(^) To detect non-constant variance, look at the residual plot. If the vertical

spread of residuals is not the same from left to right, the constant variance assumption is in trouble. E.g.,

“Fan pattern”: spread increases from left to right
“Inverse fan pattern”: spread decreases from left to right
(^) Consequences of heteroscedasticity:
Hypothesis tests and confidence intervals for β’s are misspecified.
But, least-squares estimates ( βˆ 0 , βˆ 1 ) are still unbiased.
(^) What to do: try transformations of Y.

Another residual plot

(^) We can plot residuals ei against their corresponding fitted values y ˆ i , rather

than against xi.

(^) For simple linear regression, this is just an affine transformation of X (to

β^ ˆ 0 + ˆβ 1 X ): it moves the whole residual cloud up or down, and stretches the x axis, but doesn’t change the conclusions we draw.

(^) However, it is useful in multiple linear regression, when there are many X

variables to choose from.

Heteroscedasticity

When the requirement of a constant variance is

violated we have a condition of heteroscedasticity.

(^) + (^) +
(^) +

The spread increases with y^

Residual

+^ +

+^ ++

Display effect example

100

Residual

0 1 2 3 4 5 6 7 8 DisplayFeet

Distributions Residuals Sales

-50 0 50 100

No strong indication of violation of constant variance or

normality assumptions.

(^) Nothing strongly suggests heteroscedasticity or departure from normality.

Residual plots vs time

(^) A confounder is a variable that is not included among the covariates in a

study but influences both the covariates and the response.

(^) Plots of the residuals versus the temporal or spatial order of observation

can reveal

serial correlation (violation of independence)
confounding variables associated with time and space (multiple regression can be used to account for these variables).
(^) Example: the math department at a university must plan for the number of

instructors in large service courses. The goal is to predict enrollment in such courses Y based on number of first-year students X.

Lecture Notes on Linear Regression Diagnostic and Variable Transformations | STAT 431, Study notes of Statistics

Related documents

Partial preview of the text

Download Lecture Notes on Linear Regression Diagnostic and Variable Transformations | STAT 431 and more Study notes Statistics in PDF only on Docsity!

Statistics 431:

Statistical Inference

Lecture 17: Linear regression diagnostics and

variable transformations

Regression diagnostics

Bivariate Fit of HousePrice By CrimeRate

HousePrice

CrimeRate

Summary of Fit

Residual

Distri

Resid

Summary of Fit

RSquare 0.

RSquare Adj 0.

Root Mean Square Error 78861.

Mean of Response 158464.

Observations (or Sum Wgts) 98

Parameter Estimates

Term Estimate Std Error t Ratio Prob>|t|

Intercept 225233.55 16404.02 13.73 <.

CrimeRate -2288.689 491.5375 -4.66 <.

Residual plots

Examining and Transforming Data Tukey’s bulging rule

Display effect: Transformations to

x

Display effect: Transformations to log x

Heteroscedasticity

Another residual plot

violated we have a condition of heteroscedasticity.

Display effect example

No strong indication of violation of constant variance or

normality assumptions.

Residual plots vs time