Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Analyzing Relationship between Predictors and Response Variable in Multiple Regression, Study notes of Statistics

University of Pennsylvania (UPenn)Statistics

A set of lecture notes from a statistics 102 course focusing on multiple regression analysis. It covers topics such as transforming data, model assumptions, inference in multiple regression, collinearity, and examples of multiple regression models. The notes explain how to estimate the fixed and variable costs of a lease using a regression model and discuss the importance of each predictor in the model.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-vah 🇺🇸

10 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Statistics 102 Multiple Regression

Spring, 2000 - 1 -

Multiple Regression

Project Analysis for Today

First steps

Transforming the data into a form that lets you estimate the fixed and variable

costs of a lease using a regression model that meets the three key assumptions.

Review of Multiple Regression from Last Week

Objective

Isolate the key factors that influence the response and separate their effects.

Model “Y” = β0 + β1 “X1” + ... + βk “Xk” + Error

Sales = β0 + β1 Adv$ + β2 Price + Error

with - Independence

- Constant variance σ2 about regression line

- Normally distributed errors about the regression line.

Discussion

– Model is additive

– Geometry of multiple regression

– Slopes measure effect of each predictor “holding others fixed”

“Simple” regression slope vs multiple regression slope

Relationship between R2 and RMSE

– Both describe “goodness-of-fit”

– R2 is relative whereas RMSE is absolute.

– They are related as follows:

RMSE2 = Var (residuals) ≈ (1 – R2) Var (response)

– Same interpretation in simple (one predictor) and multiple regression.

Discover Study notes of Statistics University of Pennsylvania (UPenn)

Partial preview of the text

Download Analyzing Relationship between Predictors and Response Variable in Multiple Regression and more Study notes Statistics in PDF only on Docsity!

Spring, 2000 -1-

Multiple Regression

Project Analysis for Today

First steps

Transforming the data into a form that lets you estimate the fixed and variable

costs of a lease using a regression model that meets the three key assumptions.

Review of Multiple Regression from Last Week

Objective

Isolate the key factors that influence the response and separate their effects.

M o d e l

“Y” = β 0 + β 1 “X 1” + ... + βk “X k” + Error

Sales = β 0 + β 1 Adv$ + β 2 Price + Error

with

- Independence

- Constant variance σ

2

about regression line

- Normally distributed errors about the regression line.

Discussion

– Model is additive

– Geometry of multiple regression

– Slopes measure effect of each predictor “holding others fixed”

“Simple” regression slope vs multiple regression slope

Relationship between R

2

and RMSE

– Both describe “goodness-of-fit”

– R

2

is relative whereas RMSE is absolute.

– They are related as follows:

RMSE

2

= Var ( residuals ) ≈ (1 – R

2

) Var ( response )

– Same interpretation in simple (one predictor) and multiple regression.

Spring, 2000 -2-

Inference in Multiple Regression

Inference in multiple regression

– One coefficient t-ratio (estimate/SE)

“Is this slope different from zero?”

“Does this variable significantly improve a model containing rest?”

– All coefficients overall F-ratio (anova table)

“Does this entire model explain significant amounts of variation?”

Analysis of variance (ANOVA) summary (page 141)

– Summary of how much variation is being explained per predictor.

– Example for the car data with weight and horsepower as predictors.

S o u r c e Model E r r o r C Total

DF

Sum of Squares

Mean Square

F Ratio

Prob>F <.

Why do we need different tests?

– Each addresses a specific aspect of the fitted model:

t-ratio considers one coefficient (intercept or slope)

F-ratio considers all slopes , simultaneously

– Why not just do a bunch of t-tests, one for each slope?

With 20 predictors and 95% CI, you can expect one significant (not

zero) by chance alone! Too many things will appear significant that

really are not meaningful.

– Recall the use of multiple comparisons in anova.

Spring, 2000 -4-

Example of Multiple Regression

Automobile design Car89.jmp, page 109

“What is the predicted mileage for a 4000 lb. design, and what characteristics

of the design are crucial?”

“How much does my 200 pound brother owe me for gas for carrying him

3,000 miles to California?” (Oops, it’s urban mileage in example)

– Initial one-predictor model

• Transform response to gallons per 1000 mile scale.

• Cannot compare R

2

’s since two model use different dependent

variables (MPG and GPM)

• Effect of scaling from GPM to GP1000M.

• RMSE = 4.23 (p 111)

• Skewness in residuals from regression with Weight. (p 112)

• Prediction @ 4000 lbs = 63.9, ⇑ 200 lbs for 3000 miles ≈ 8.2 gals

– Add variable for Horsepower (p 117)

• R 2 increases from 77% to 84% (added variable is significant, t=7.21)

• RMSE drops to 3.

• Predictors are related, both increase together, higher SE for Weight.

• Picture explains the increase in SE due to restricted range (p 120).

• ⇑ 200 lbs for 3000 miles ≈ 5.3 gals

• Prediction from multiple regression

– Add a predictor less correlated with Weight, use HP/Pound (p 123)

• Weight and HP/Pound less related, more distinct properties of these cars.

• Engineer can manipulate these separately, unlike HP and weight.

Residual plots

– Show residuals plotted on fitted values

– Inspect for deviations from assumptions (such as lack of constant variance)

Leverage plots (p 125)

– Diagnostic plot, designed especially for multiple regression

– Reveals leveraged observations in multiple regression.

Next steps for this model…

– What other factors are important for the design?

– How small can we make the RMSE?

Spring, 2000 -5-

Example with Extreme Collinearity in Multiple Regression

Stock prices and market indices Stocks.jmp, page 138

“What’s the beta for Walmart when regressed on two indices?”

– Fitted slope of stock returns on market estimate the beta for the stock.

– Huge collinearity (correlation between VW and S&P is 0.993), so almost

no unique variation in either one given that other is in model.

– Either taken separately is a good predictor, but show weak effects

when used together.

– “Squished” leverage plots... little unique variation in either predictor

available to explain the variation in the response. (p 144)

– More complete VW index is better predictor, as financial theory suggests.

Next Time

Categorical predictors…

Categorical predictors allow us to compare regression models for different

groups, judging if the models for the different groups are comparable.

Spring, 2000 -7-

Horsepower

Weight(lb)

SE(slope estimate for X (^) j ) ≈

σ √n

1 SD(Adjusted X (^) j)

=

σ √n

√VIF (^) j SD(X (^) j) = √VIF (^) j ∗ (SE if no collinearity)

T e r m Intercept Weight(lb) Horsepower

E s t i m a t e

Std Error

t Ratio

P r o b > | t | <. <. <.

V I F

Residual

GP1000M City Predicted

5

0

5

1 0

Spring, 2000 -8-

C o r r e l a t i o n s

V a r i a b l e VW S P 5 0 0 WALMART Sequence Number

VW

S P 5 0 0

WALMART

Sequence Number -0.

-0.

Scatterplot Matrix

VW

SP

WALMART

Sequence Number

Parameter Estimates T e r m Intercept SP

E s t i m a t e

Std Error

t Ratio

P r o b > | t |

<.

V I F

Parameter Estimates T e r m Intercept SP VW

E s t i m a t e

-1.

Std Error

t Ratio

-1.

P r o b > | t |

Analyzing Relationship between Predictors and Response Variable in Multiple Regression, Study notes of Statistics

Related documents

Partial preview of the text

Download Analyzing Relationship between Predictors and Response Variable in Multiple Regression and more Study notes Statistics in PDF only on Docsity!

Spring, 2000 -1-

Multiple Regression

Project Analysis for Today

First steps

Transforming the data into a form that lets you estimate the fixed and variable

costs of a lease using a regression model that meets the three key assumptions.

Review of Multiple Regression from Last Week

Objective

Isolate the key factors that influence the response and separate their effects.

M o d e l

“Y” = β 0 + β 1 “X 1” + ... + βk “X k” + Error

Sales = β 0 + β 1 Adv$ + β 2 Price + Error

with

- Independence

- Constant variance σ

about regression line

- Normally distributed errors about the regression line.

Discussion

– Model is additive

– Geometry of multiple regression

– Slopes measure effect of each predictor “holding others fixed”

“Simple” regression slope vs multiple regression slope

Relationship between R

and RMSE

– Both describe “goodness-of-fit”

– R

is relative whereas RMSE is absolute.

– They are related as follows:

RMSE

= Var ( residuals ) ≈ (1 – R

) Var ( response )

– Same interpretation in simple (one predictor) and multiple regression.

Spring, 2000 -2-

Inference in Multiple Regression

Inference in multiple regression

– One coefficient t-ratio (estimate/SE)

“Is this slope different from zero?”

“Does this variable significantly improve a model containing rest?”

– All coefficients overall F-ratio (anova table)

“Does this entire model explain significant amounts of variation?”

Analysis of variance (ANOVA) summary (page 141)

– Summary of how much variation is being explained per predictor.

– Example for the car data with weight and horsepower as predictors.

DF

Why do we need different tests?

– Each addresses a specific aspect of the fitted model:

t-ratio considers one coefficient (intercept or slope)

F-ratio considers all slopes , simultaneously

– Why not just do a bunch of t-tests, one for each slope?

With 20 predictors and 95% CI, you can expect one significant (not

zero) by chance alone! Too many things will appear significant that

really are not meaningful.

– Recall the use of multiple comparisons in anova.

Spring, 2000 -4-

Example of Multiple Regression

Automobile design Car89.jmp, page 109

“What is the predicted mileage for a 4000 lb. design, and what characteristics

of the design are crucial?”

“How much does my 200 pound brother owe me for gas for carrying him

3,000 miles to California?” (Oops, it’s urban mileage in example)

– Initial one-predictor model

• Transform response to gallons per 1000 mile scale.

• Cannot compare R

’s since two model use different dependent

variables (MPG and GPM)

• Effect of scaling from GPM to GP1000M.

• RMSE = 4.23 (p 111)

• Skewness in residuals from regression with Weight. (p 112)

• Prediction @ 4000 lbs = 63.9, ⇑ 200 lbs for 3000 miles ≈ 8.2 gals

– Add variable for Horsepower (p 117)

• R 2 increases from 77% to 84% (added variable is significant, t=7.21)

• RMSE drops to 3.

• Predictors are related, both increase together, higher SE for Weight.

• Picture explains the increase in SE due to restricted range (p 120).

• ⇑ 200 lbs for 3000 miles ≈ 5.3 gals

• Prediction from multiple regression