Multiple Linear Regression, Study notes of Technology

Adjusted R2 can even be negative, whereas R2 itself must be between 0 and 1. A value of that is substantially smaller than R2 itself is a warning that the model ...

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

maya-yct
maya-yct 🇬🇧

4.8

(9)

217 documents

1 / 49

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
13 Multiple(Linear(
Regression(
Chapter(12
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31

Partial preview of the text

Download Multiple Linear Regression and more Study notes Technology in PDF only on Docsity!

Multiple Linear

Regression

Chapter 12

Multiple Regression Analysis

Definition The multiple regression model equation is

Y = b 0 + b 1 x 1 + b 2 x 2 + ... + b pxp + ε

where E ( ε ) = 0 and Var ( ε ) = s

2 . Again, it is assumed that ε is normally distributed. This is not a regression line any longer, but a regression surface and we relate y to more than one predictor variable x 1 , x 2 , … , xp. (ex. Blood sugar level vs. weight and age)

Easier Notation?

The multiple regression model can be written in matrix form.

Estimating Parameters

To estimate the parameters b 0 , b 1 ,..., b p using the principle of least squares, form the sum of squared deviations of the observed yj ’s from the regression line: The least squares estimates are those values of the b i s that minimize the equation. You could do this by taking the partial derivative w.r.t. to each parameter, and then solving the k+ unknowns using the k+1 equations (akin to the simple regression method). But we don’t do it that way. Q = " #$ % & $'( = " (*$ − ,- − ,(.($ − ⋯ − , 0. 1 $ ) % & $'(

Example

Suppose, for example, that y is the lifetime of a certain tool, and that there are 3 brands of tool being investigated. Let: x 1 = 1 if tool A is used, and 0 otherwise, x 2 = 1 if tool B is used, and 0 otherwise, x 3 = 1 if tool C is used, and 0 otherwise. Then, if an observation is on a: brand A tool: we have x 1 = 1 and x 2 = 0 and x 3 = 0, brand B tool: we have x 1 = 0 and x 2 = 1 and x 3 = 0, brand C tool: we have x 1 = 0 and x 2 = 0 and x 3 = 1. What would our X matrix look like?

R

2

and s

^^2

Just as before, the total sum of squares is SST = S( yiy ) 2 , And the regression sum of squares is: Then the coefficient of multiple determination R 2 is R 2 = 1 – SSE/SST = SSR/SST It is interpreted in the same way as before.

R

2 !!" = $ (&' − &)

= !!+ − !!,.

R

2 Unfortunately, there is a problem with R 2 : Its value can be inflated by adding lots of predictors into the model even if most of these predictors are frivolous.

R

2 The objective in multiple regression is not simply to explain most of the observed y variation, but to do so using a model with relatively few predictors that are easily interpreted. It is thus desirable to adjust R 2 to take account of the size of the model: !"

= 1 − ''( ) − * + 1 '', ) − 1 = 1 − ) − 1 ) − (* + 1 ) × ''( '', 0

R

2 Because the ratio in front of SSE/SST exceeds 1, is smaller than R 2

. Furthermore, the larger the number of predictors p relative to the sample size n , the smaller will be relative to R 2 . Adjusted R 2 can even be negative, whereas R 2 itself must be between 0 and 1. A value of that is substantially smaller than R 2 itself is a warning that the model may contain too many predictors.

Example

Investigators carried out a study to see how various characteristics of concrete are influenced by x 1 = % limestone powder x 2 = water cement ratio, resulting in data published in “Durability of Concrete with Addition of Limestone Powder,” Magazine of Concrete Research , 1996: 131 – 137.

Example

Consider predicting compressive strength (strength) with percent limestone powder (perclime) and water cement ratio (watercement). > fit = lm(strength ~ perclime + watercement, data = dataset) > summary(fit) ... Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 86.2471 21.7242 3.970 0.00737 ** perclime 0.1643 0.1993 0.824 0. watercement - 80.5588 35.1557 - 2.291 0..

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.832 on 6 degrees of freedom Multiple R-squared: 0.4971, Adjusted R-squared: 0. F-statistic: 2.965 on 2 and 6 DF, p-value: 0. cont’d

Important Questions:

  • Model utility: Are all predictors significantly related to our outcome? (Is our model any good?)
  • Does any particular predictor or predictor subset matter more?
  • Are any predictors related to each other?
  • Among all possible models, which is the “best”?

Model Selection

A Model Utility Test

The model utility test in simple linear regression involves

the null hypothesis H 0 : b 1 = 0, according to which there is

no useful linear relation between y and the predictor x. In MLR we test the hypothesis

H 0 : b 1 = 0, b 2 = 0,..., b p = 0,

which says that there is no useful linear relationship between y and any of the p predictors. If at least one of

these b’s is not 0, the model is deemed useful.

We could test each b separately, but that would take time

and be very conservative (if Bonferroni correction is used). A better test is a joint test, and is based on a statistic that has an F distribution when H 0 is true.