Multiple Linear Regression: Birds of the High Paramo - Lecture Notes, Lecture notes of Statistics

[Week 10] Multiple Linear Regression -- T test, Confidence Intervals, High leverage points

Typology: Lecture notes

2018/2019

Uploaded on 06/15/2019

kefart
kefart 🇺🇸

4.4

(11)

55 documents

1 / 46

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 9: Multiple Linear Regression
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e

Partial preview of the text

Download Multiple Linear Regression: Birds of the High Paramo - Lecture Notes and more Lecture notes Statistics in PDF only on Docsity!

Lecture 9: Multiple Linear Regression

Outline

Lecture 9: Multiple Linear Regression Example – Birds of the High Paramo Theory – Fitting a Model to Multivariate Data Example – Codes Theory – Single parameter t tests Theory - Multiple Correlation Coefficient Theory – Confidence Intervals for Regression Parameters Theory – Prediction in Linear Models Theory – High Leverage Points Theory – Dangers of multicollinearity Categorical Variable in Linear Models

Example – Birds of the High Paramo

I (^) A paramo is an exposed, high plateau in the tropical parts of South America. I (^) For each of the n = 14 island of vegetation the following variables were recorded: I (^) number of species of bird present (N), I (^) area of the island in square kilometers (AR), I (^) elevation in thousands of meters (EL), I (^) the distance from Ecuador in kilometers (DEc) I (^) distance to the nearest other island in kilometers (DNI). I (^) The response variable Y is the number of species (N). I (^) The k = 4 explanatory variables are AR, EL, DEc and DNI.

Reference: Vuilleumier (1970), ‘Insular biogeography in continental regions. I. The northern Andes of South America’, American Naturaliste, 104 , 373-388.

Theory – Fitting a Model to Multivariate Data

I (^) Suppose we have n independent observations with k associated known (explanatory) values. A natural extension of simple linear regression is to consider the model with k predictor variables

Yi = β 0 + β 1 xi 1 +... + βkxik + i, i = 1,... , n,

where i ∼ N ID(0, σ^2 ). I (^) Note: there are p = k + 1 number of regression parameters in this model. I (^) Again the parameters are estimated by minimising the sum of squares of the residuals S(β) =

∑^ n

i=

Yi − (β 0 + β 1 xi 1 +... + βkxik)

Thus, βˆ = ( βˆ 0 , βˆ 1 ,... , βˆk)>^ = arg min β

S(β).

Example – Numerical Summary

summary(dat)

N AR EL DEc

Min. : 4.00 Min. :0.0300 Min. :0.460 Min. : 36.

1st Qu.:13.00 1st Qu.:0.0875 1st Qu.:0.670 1st Qu.: 606.

Median :17.50 Median :0.2750 Median :0.905 Median : 954.

Mean :20.71 Mean :0.6557 Mean :1.117 Mean : 848.

3rd Qu.:29.75 3rd Qu.:0.8950 3rd Qu.:1.440 3rd Qu.:1141.

Max. :37.00 Max. :2.1700 Max. :2.280 Max. :1380.

DNI

Min. : 5.

1st Qu.:14.

Median :32.

Mean :36.

3rd Qu.:52.

Max. :83.

Example – Pairwise Sample Correlation

round(cor(dat), 2)

N AR EL DEc DNI

N 1.00 0.58 0.50 -0.69 -0.

AR 0.58 1.00 0.62 -0.16 0.

EL 0.50 0.62 1.00 -0.15 0.

DEc -0.69 -0.16 -0.15 1.00 0.

DNI -0.14 0.11 0.02 0.35 1.

Example – Fitting a Multiple Linear Regression Model

M1 = lm(N ~ 1 + AR + EL + DEc + DNI, data = dat) summary(M1)

....

Estimate Std. Error t value Pr(>|t|)

(Intercept) 27.889386 6.181843 4.511 0.00146 **

AR 5.153864 3.098074 1.664 0.

EL 3.075136 4.000326 0.769 0.

DEc -0.017216 0.005243 -3.284 0.00947 **

DNI 0.016591 0.077573 0.214 0.

---

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 6.705 on 9 degrees of freedom

Multiple R-squared: 0.7301,Adjusted R-squared: 0.

.... (^) 10/

Example – Fitting a Multiple Linear Regression Model

I (^) A full stop on the right hand side of an R model formula represents all variables in the data frame except for the response, the following produce the same output:

summary(lm(N ~ 1 + AR + EL + DEc + DNI, data = dat)) summary(lm(N ~ ., data = dat))

....

(Intercept) 27.889386 6.181843 4.511 0.00146 **

AR 5.153864 3.098074 1.664 0.

EL 3.075136 4.000326 0.769 0.

DEc -0.017216 0.005243 -3.284 0.00947 **

DNI 0.016591 0.077573 0.214 0.

....

Example – Closer look at summary(lm(...))

The Std. Error is the standard error of the estimate of the regression parameter, SE( βˆj ) =

V ar( βˆj ), e.g. SE( βˆ 0 ) = 6. 18. (^) 13/

Example – Closer look at summary(lm(...))

The t value is the test statistic for testing H 0 : βj = 0, i.e. βˆj SE( βˆj )

Theory – Single parameter t tests

I (^) The summary(lm(...)) command in R provides information for testing the importance of a covariate taking into account all other variables in the model. I (^) Specifically, given the model

Yi = β 0 + β 1 xi 1 +... + βkxik + i (i = 1, 2 ,... , n) the output provides statistics for performing a t test of H 0 : βj = 0 vs. H 1 : βj 6 = 0 for any of the p = k + 1 given variables xj , j = 0, 1 ,... , k, making no assumptions about the other regression parameters. I (^) To test H 0 : βj = 0 we can use

t∗^ = βˆj − 0 SE( βˆj )

under ∼ H (^0) t n−p ⇒^ p-value^ =^ P^ (|tn−p| ≥^ t∗).

Example – Closer look at summary(lm(...))

Recall that i ∼ N ID(0, σ^2 ). An estimate of σ^2 is

  1. 7052 = σˆ^2 = RSS/(n − p).

Example – Closer look at summary(lm(...))

So R^2 = 0. 7301.

Example – Closer look at summary(lm(...))

So R^2 a = 0. 6101.