Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Multiple Linear Regression: Birds of the High Paramo - Lecture Notes, Lecture notes of Statistics

University of Sydney (US)Statistics

[Week 10] Multiple Linear Regression -- T test, Confidence Intervals, High leverage points

Typology: Lecture notes

2018/2019

Uploaded on 06/15/2019

kefart 🇺🇸

4.4

(11)

55 documents

1 / 46

This page cannot be seen from the preview

Don't miss anything!

Lecture 9: Multiple Linear Regression

Discover Lecture notes of Statistics University of Sydney (US)

Partial preview of the text

Download Multiple Linear Regression: Birds of the High Paramo - Lecture Notes and more Lecture notes Statistics in PDF only on Docsity!

Lecture 9: Multiple Linear Regression

Outline

Lecture 9: Multiple Linear Regression Example – Birds of the High Paramo Theory – Fitting a Model to Multivariate Data Example – Codes Theory – Single parameter t tests Theory - Multiple Correlation Coefficient Theory – Confidence Intervals for Regression Parameters Theory – Prediction in Linear Models Theory – High Leverage Points Theory – Dangers of multicollinearity Categorical Variable in Linear Models

Example – Birds of the High Paramo

I (^) A paramo is an exposed, high plateau in the tropical parts of South America. I (^) For each of the n = 14 island of vegetation the following variables were recorded: I (^) number of species of bird present (N), I (^) area of the island in square kilometers (AR), I (^) elevation in thousands of meters (EL), I (^) the distance from Ecuador in kilometers (DEc) I (^) distance to the nearest other island in kilometers (DNI). I (^) The response variable Y is the number of species (N). I (^) The k = 4 explanatory variables are AR, EL, DEc and DNI.

Reference: Vuilleumier (1970), ‘Insular biogeography in continental regions. I. The northern Andes of South America’, American Naturaliste, 104 , 373-388.

Theory – Fitting a Model to Multivariate Data

I (^) Suppose we have n independent observations with k associated known (explanatory) values. A natural extension of simple linear regression is to consider the model with k predictor variables

Yi = β 0 + β 1 xi 1 +... + βkxik + i, i = 1,... , n,

where i ∼ N ID(0, σ^2 ). I (^) Note: there are p = k + 1 number of regression parameters in this model. I (^) Again the parameters are estimated by minimising the sum of squares of the residuals S(β) =

∑^ n

Yi − (β 0 + β 1 xi 1 +... + βkxik)

Thus, βˆ = ( βˆ 0 , βˆ 1 ,... , βˆk)>^ = arg min β

S(β).

Example – Numerical Summary

summary(dat)

N AR EL DEc

Min. : 4.00 Min. :0.0300 Min. :0.460 Min. : 36.

1st Qu.:13.00 1st Qu.:0.0875 1st Qu.:0.670 1st Qu.: 606.

Median :17.50 Median :0.2750 Median :0.905 Median : 954.

Mean :20.71 Mean :0.6557 Mean :1.117 Mean : 848.

3rd Qu.:29.75 3rd Qu.:0.8950 3rd Qu.:1.440 3rd Qu.:1141.

Max. :37.00 Max. :2.1700 Max. :2.280 Max. :1380.

DNI

Min. : 5.

1st Qu.:14.

Median :32.

Mean :36.

3rd Qu.:52.

Max. :83.

Example – Pairwise Sample Correlation

round(cor(dat), 2)

N AR EL DEc DNI

N 1.00 0.58 0.50 -0.69 -0.

AR 0.58 1.00 0.62 -0.16 0.

EL 0.50 0.62 1.00 -0.15 0.

DEc -0.69 -0.16 -0.15 1.00 0.

DNI -0.14 0.11 0.02 0.35 1.

Example – Fitting a Multiple Linear Regression Model

M1 = lm(N ~ 1 + AR + EL + DEc + DNI, data = dat) summary(M1)

....

Estimate Std. Error t value Pr(>|t|)

(Intercept) 27.889386 6.181843 4.511 0.00146 **

AR 5.153864 3.098074 1.664 0.

EL 3.075136 4.000326 0.769 0.

DEc -0.017216 0.005243 -3.284 0.00947 **

DNI 0.016591 0.077573 0.214 0.

---

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 6.705 on 9 degrees of freedom

Multiple R-squared: 0.7301,Adjusted R-squared: 0.

.... (^) 10/

Example – Fitting a Multiple Linear Regression Model

I (^) A full stop on the right hand side of an R model formula represents all variables in the data frame except for the response, the following produce the same output:

summary(lm(N ~ 1 + AR + EL + DEc + DNI, data = dat)) summary(lm(N ~ ., data = dat))

....

(Intercept) 27.889386 6.181843 4.511 0.00146 **

AR 5.153864 3.098074 1.664 0.

EL 3.075136 4.000326 0.769 0.

DEc -0.017216 0.005243 -3.284 0.00947 **

DNI 0.016591 0.077573 0.214 0.

....

Example – Closer look at summary(lm(...))

The Std. Error is the standard error of the estimate of the regression parameter, SE( βˆj ) =

V ar( βˆj ), e.g. SE( βˆ 0 ) = 6. 18. (^) 13/

Example – Closer look at summary(lm(...))

The t value is the test statistic for testing H 0 : βj = 0, i.e. βˆj SE( βˆj )

Theory – Single parameter t tests

I (^) The summary(lm(...)) command in R provides information for testing the importance of a covariate taking into account all other variables in the model. I (^) Specifically, given the model

Yi = β 0 + β 1 xi 1 +... + βkxik + i (i = 1, 2 ,... , n) the output provides statistics for performing a t test of H 0 : βj = 0 vs. H 1 : βj 6 = 0 for any of the p = k + 1 given variables xj , j = 0, 1 ,... , k, making no assumptions about the other regression parameters. I (^) To test H 0 : βj = 0 we can use

t∗^ = βˆj − 0 SE( βˆj )

under ∼ H (^0) t n−p ⇒^ p-value^ =^ P^ (|tn−p| ≥^ t∗).

Example – Closer look at summary(lm(...))

Recall that i ∼ N ID(0, σ^2 ). An estimate of σ^2 is

7052 = σˆ^2 = RSS/(n − p).

Example – Closer look at summary(lm(...))

So R^2 = 0. 7301.

Example – Closer look at summary(lm(...))

So R^2 a = 0. 6101.

Multiple Linear Regression: Birds of the High Paramo - Lecture Notes, Lecture notes of Statistics

Related documents

Partial preview of the text

Download Multiple Linear Regression: Birds of the High Paramo - Lecture Notes and more Lecture notes Statistics in PDF only on Docsity!

Lecture 9: Multiple Linear Regression

Outline

Example – Birds of the High Paramo

Theory – Fitting a Model to Multivariate Data

Example – Numerical Summary

N AR EL DEc

Min. : 4.00 Min. :0.0300 Min. :0.460 Min. : 36.

1st Qu.:13.00 1st Qu.:0.0875 1st Qu.:0.670 1st Qu.: 606.

Median :17.50 Median :0.2750 Median :0.905 Median : 954.

Mean :20.71 Mean :0.6557 Mean :1.117 Mean : 848.

3rd Qu.:29.75 3rd Qu.:0.8950 3rd Qu.:1.440 3rd Qu.:1141.

Max. :37.00 Max. :2.1700 Max. :2.280 Max. :1380.

DNI

Min. : 5.

1st Qu.:14.

Median :32.

Mean :36.

3rd Qu.:52.

Max. :83.

Example – Pairwise Sample Correlation

N AR EL DEc DNI

N 1.00 0.58 0.50 -0.69 -0.

AR 0.58 1.00 0.62 -0.16 0.

EL 0.50 0.62 1.00 -0.15 0.

DEc -0.69 -0.16 -0.15 1.00 0.

DNI -0.14 0.11 0.02 0.35 1.

Example – Fitting a Multiple Linear Regression Model

Estimate Std. Error t value Pr(>|t|)

(Intercept) 27.889386 6.181843 4.511 0.00146 **

AR 5.153864 3.098074 1.664 0.

EL 3.075136 4.000326 0.769 0.

DEc -0.017216 0.005243 -3.284 0.00947 **

DNI 0.016591 0.077573 0.214 0.

---

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 6.705 on 9 degrees of freedom

Multiple R-squared: 0.7301,Adjusted R-squared: 0.

Example – Fitting a Multiple Linear Regression Model

(Intercept) 27.889386 6.181843 4.511 0.00146 **

AR 5.153864 3.098074 1.664 0.

EL 3.075136 4.000326 0.769 0.

DEc -0.017216 0.005243 -3.284 0.00947 **

DNI 0.016591 0.077573 0.214 0.

Example – Closer look at summary(lm(...))

Example – Closer look at summary(lm(...))

Theory – Single parameter t tests

Example – Closer look at summary(lm(...))

Example – Closer look at summary(lm(...))

Example – Closer look at summary(lm(...))