Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Multiple Linear Regression, Study notes of Technology

Technology

Adjusted R2 can even be negative, whereas R2 itself must be between 0 and 1. A value of that is substantially smaller than R2 itself is a warning that the model ...

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

maya-yct 🇬🇧

4.8

(9)

217 documents

1 / 49

This page cannot be seen from the preview

Don't miss anything!

13 Multiple(Linear(

Regression(

Chapter(12

Partial preview of the text

Download Multiple Linear Regression and more Study notes Technology in PDF only on Docsity!

Multiple Linear

Regression

Chapter 12

Multiple Regression Analysis

Definition The multiple regression model equation is

Y = b 0 + b 1 x 1 + b 2 x 2 + ... + b pxp + ε

where E ( ε ) = 0 and Var ( ε ) = s

2 . Again, it is assumed that ε is normally distributed. This is not a regression line any longer, but a regression surface and we relate y to more than one predictor variable x 1 , x 2 , … , xp. (ex. Blood sugar level vs. weight and age)

Easier Notation?

The multiple regression model can be written in matrix form.

Estimating Parameters

To estimate the parameters b 0 , b 1 ,..., b p using the principle of least squares, form the sum of squared deviations of the observed yj ’s from the regression line: The least squares estimates are those values of the b i s that minimize the equation. You could do this by taking the partial derivative w.r.t. to each parameter, and then solving the k+ unknowns using the k+1 equations (akin to the simple regression method). But we don’t do it that way. Q = " #$ % & $'( = " (*$ − ,- − ,(.($ − ⋯ − , 0. 1 $ ) % & $'(

Example

Suppose, for example, that y is the lifetime of a certain tool, and that there are 3 brands of tool being investigated. Let: x 1 = 1 if tool A is used, and 0 otherwise, x 2 = 1 if tool B is used, and 0 otherwise, x 3 = 1 if tool C is used, and 0 otherwise. Then, if an observation is on a: brand A tool: we have x 1 = 1 and x 2 = 0 and x 3 = 0, brand B tool: we have x 1 = 0 and x 2 = 1 and x 3 = 0, brand C tool: we have x 1 = 0 and x 2 = 0 and x 3 = 1. What would our X matrix look like?

R

and s

^^2

Just as before, the total sum of squares is SST = S( yi – y ) 2 , And the regression sum of squares is: Then the coefficient of multiple determination R 2 is R 2 = 1 – SSE/SST = SSR/SST It is interpreted in the same way as before.

R

2 !!" = $ (&' − &)

= !!+ − !!,.

R

2 Unfortunately, there is a problem with R 2 : Its value can be inflated by adding lots of predictors into the model even if most of these predictors are frivolous.

R

2 The objective in multiple regression is not simply to explain most of the observed y variation, but to do so using a model with relatively few predictors that are easily interpreted. It is thus desirable to adjust R 2 to take account of the size of the model: !"

= 1 − ''( ) − * + 1 '', ) − 1 = 1 − ) − 1 ) − (* + 1 ) × ''( '', 0

R

2 Because the ratio in front of SSE/SST exceeds 1, is smaller than R 2

. Furthermore, the larger the number of predictors p relative to the sample size n , the smaller will be relative to R 2 . Adjusted R 2 can even be negative, whereas R 2 itself must be between 0 and 1. A value of that is substantially smaller than R 2 itself is a warning that the model may contain too many predictors.

Example

Investigators carried out a study to see how various characteristics of concrete are influenced by x 1 = % limestone powder x 2 = water cement ratio, resulting in data published in “Durability of Concrete with Addition of Limestone Powder,” Magazine of Concrete Research , 1996: 131 – 137.

Example

Consider predicting compressive strength (strength) with percent limestone powder (perclime) and water cement ratio (watercement). > fit = lm(strength ~ perclime + watercement, data = dataset) > summary(fit) ... Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 86.2471 21.7242 3.970 0.00737 ** perclime 0.1643 0.1993 0.824 0. watercement - 80.5588 35.1557 - 2.291 0..

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.832 on 6 degrees of freedom Multiple R-squared: 0.4971, Adjusted R-squared: 0. F-statistic: 2.965 on 2 and 6 DF, p-value: 0. cont’d

Important Questions:

Model utility: Are all predictors significantly related to our outcome? (Is our model any good?)
Does any particular predictor or predictor subset matter more?
Are any predictors related to each other?
Among all possible models, which is the “best”?

Model Selection

A Model Utility Test

The model utility test in simple linear regression involves

the null hypothesis H 0 : b 1 = 0, according to which there is

no useful linear relation between y and the predictor x. In MLR we test the hypothesis

H 0 : b 1 = 0, b 2 = 0,..., b p = 0,

which says that there is no useful linear relationship between y and any of the p predictors. If at least one of

these b’s is not 0, the model is deemed useful.

We could test each b separately, but that would take time

and be very conservative (if Bonferroni correction is used). A better test is a joint test, and is based on a statistic that has an F distribution when H 0 is true.

Multiple Linear Regression, Study notes of Technology

Related documents

Partial preview of the text

Download Multiple Linear Regression and more Study notes Technology in PDF only on Docsity!

Multiple Linear

Regression

Multiple Regression Analysis

Y = b 0 + b 1 x 1 + b 2 x 2 + ... + b pxp + ε

where E ( ε ) = 0 and Var ( ε ) = s

Easier Notation?

Estimating Parameters

Example

R

and s

^^2

R

R

R

R

Example

Example

Model Selection

A Model Utility Test

the null hypothesis H 0 : b 1 = 0, according to which there is

H 0 : b 1 = 0, b 2 = 0,..., b p = 0,

these b’s is not 0, the model is deemed useful.

We could test each b separately, but that would take time