






































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
[Week 10] Multiple Linear Regression -- T test, Confidence Intervals, High leverage points
Typology: Lecture notes
1 / 46
This page cannot be seen from the preview
Don't miss anything!







































Lecture 9: Multiple Linear Regression Example – Birds of the High Paramo Theory – Fitting a Model to Multivariate Data Example – Codes Theory – Single parameter t tests Theory - Multiple Correlation Coefficient Theory – Confidence Intervals for Regression Parameters Theory – Prediction in Linear Models Theory – High Leverage Points Theory – Dangers of multicollinearity Categorical Variable in Linear Models
I (^) A paramo is an exposed, high plateau in the tropical parts of South America. I (^) For each of the n = 14 island of vegetation the following variables were recorded: I (^) number of species of bird present (N), I (^) area of the island in square kilometers (AR), I (^) elevation in thousands of meters (EL), I (^) the distance from Ecuador in kilometers (DEc) I (^) distance to the nearest other island in kilometers (DNI). I (^) The response variable Y is the number of species (N). I (^) The k = 4 explanatory variables are AR, EL, DEc and DNI.
Reference: Vuilleumier (1970), ‘Insular biogeography in continental regions. I. The northern Andes of South America’, American Naturaliste, 104 , 373-388.
I (^) Suppose we have n independent observations with k associated known (explanatory) values. A natural extension of simple linear regression is to consider the model with k predictor variables
Yi = β 0 + β 1 xi 1 +... + βkxik + i, i = 1,... , n,
where i ∼ N ID(0, σ^2 ). I (^) Note: there are p = k + 1 number of regression parameters in this model. I (^) Again the parameters are estimated by minimising the sum of squares of the residuals S(β) =
∑^ n
i=
Yi − (β 0 + β 1 xi 1 +... + βkxik)
Thus, βˆ = ( βˆ 0 , βˆ 1 ,... , βˆk)>^ = arg min β
S(β).
summary(dat)
round(cor(dat), 2)
M1 = lm(N ~ 1 + AR + EL + DEc + DNI, data = dat) summary(M1)
....
.... (^) 10/
I (^) A full stop on the right hand side of an R model formula represents all variables in the data frame except for the response, the following produce the same output:
summary(lm(N ~ 1 + AR + EL + DEc + DNI, data = dat)) summary(lm(N ~ ., data = dat))
....
....
The Std. Error is the standard error of the estimate of the regression parameter, SE( βˆj ) =
V ar( βˆj ), e.g. SE( βˆ 0 ) = 6. 18. (^) 13/
The t value is the test statistic for testing H 0 : βj = 0, i.e. βˆj SE( βˆj )
I (^) The summary(lm(...)) command in R provides information for testing the importance of a covariate taking into account all other variables in the model. I (^) Specifically, given the model
Yi = β 0 + β 1 xi 1 +... + βkxik + i (i = 1, 2 ,... , n) the output provides statistics for performing a t test of H 0 : βj = 0 vs. H 1 : βj 6 = 0 for any of the p = k + 1 given variables xj , j = 0, 1 ,... , k, making no assumptions about the other regression parameters. I (^) To test H 0 : βj = 0 we can use
t∗^ = βˆj − 0 SE( βˆj )
under ∼ H (^0) t n−p ⇒^ p-value^ =^ P^ (|tn−p| ≥^ t∗).
Recall that i ∼ N ID(0, σ^2 ). An estimate of σ^2 is
So R^2 = 0. 7301.
So R^2 a = 0. 6101.