Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture 10: Model Selection and Comparison in Regression Analysis, Lecture notes of Statistics

University of Sydney (US)Statistics

[Week 11] Model Selection -- F test, Backward, Forward and Stepwise Variable Selection

Typology: Lecture notes

2018/2019

Uploaded on 06/15/2019

kefart 🇺🇸

4.4

(11)

55 documents

1 / 54

This page cannot be seen from the preview

Don't miss anything!

Lecture 10: Model Selection

Discover Lecture notes of Statistics University of Sydney (US)

Partial preview of the text

Download Lecture 10: Model Selection and Comparison in Regression Analysis and more Lecture notes Statistics in PDF only on Docsity!

Lecture 10: Model Selection

Outline

Lecture 10: Model Selection The general F-test Model Selection Backward Variable Selection Forward Variable Selection Stepwise Variable Selection Akaike information criterion Bayesian information criterion

Example – Two explanatory variables

For the multiple regression model

Yi = β 0 + β 1 xi 1 +... + βkxik + i, i ∼ N ID(0, σ^2 )

we sometimes want to test a hypothesis that constrains (‘fixes values’) several parameters simultaneously. Thus, possible models are:

Yi = β 0 + β 1 xi 1 + β 2 xi 2 + i, or Yi = β 0 + β 2 xi 2 + i, or Yi = β 0 + β 1 xi 1 + i, or Yi = β 0 + i

Example: k ≥ 2 , set two parameter values to zero

I (^) Setting two parameter values to zero, i.e.

H 0 : β 1 = β 2 = 0 versus H 1 : β 1 6 = 0 ∨ β 2 6 = 0.

I (^) Example: k ≥ 2 , all (slope) parameter values set to zero

H 0 : β 1 =... = βk = 0 versus H 1 : ∃βj 6 = 0, j = 1,... , k. I (^) This test problem is also known as the omnibus test problem or overall test. I (^) We can test each of the constraints separately using t-tests but how do we combine the outcomes to get a single p-value for the combined claim?

Theory – Remarks on the F-statistic

I (^) p and q typically denote the number of parameters in H 1 and H 0 , respectively. I (^) The denominator S 0 /(n − p) = ˆσ^2 is the adjustment for (estimator of) scale. I (^) (p − q) is the number of linearly independent constraints imposed by H 0. I (^) If H 0 : βj = 0 for one and only one j ∈ { 0 ,... , k} then the F-test just reduces to the usual t-test. Note: F 1 ,n−p = t^2 n−p. I (^) The F-statistic provided in summary(lm(...)) is the statistic for the omnibus test problem.

Example – Catheter data

M0 = lm(L ~. , data = dat) summary(M0)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 21.0084 8.7512 2.401 0.0399 *

H 0.1964 0.3606 0.545 0.

W 0.1908 0.1652 1.155 0.

Residual standard error: 3.943 on 9 degrees of freedom

Multiple R-squared: 0.8053,Adjusted R-squared: 0.

F-statistic: 18.62 on 2 and 9 DF, p-value: 0.

Example – Catheter data

Information for the null model I (^) For H 0 : β 1 = β 2 = 0 the model is Li = β 0 + i. I (^) S 1 = 718. 729 with df = 11 is the RSS from the null model with M1 = lm(L ~ 1 , data = dat) summary(M1)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 36.208 2.333 15.52 7.97e-09 ***

---

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 8.083 on 11 degrees of freedom

S1 = deviance(M1) S

[1] 718.

M1$df

[1] 11

Example – Catheter data

Test for the null model vs full model The observed f value is

f ∗^ =

(S 1 − S 0 )/ 2

S 0 / 9

= 18. 616 , ⇒ p-value = P (F 2 , 9 ≥ 18 .616) = 0. 00063.

Thus, we reject H 0.

f.obs = (S1 - S0) / 2 / (S0 / 9); f.obs

[1] 18.

1 - pf(f.obs, 2, 9) # omnibus p-value

[1] 0.

Although we can drop height or weight individually we cannot drop both variables.

Remarks – General thoughts on choosing between models

I (^) In choosing between models, statisticians have two aims: I (^) to choose a simple (i.e. not too complex) model; I (^) to choose a model that fits the data well. I (^) A possibility to measure the complexity of a linear regression model is by the number of regression parameters, p. The greater this value, the more complex the model. I (^) A possibility to measure the closeness of fit of the model to data is by using the residual sum of squares (RSS). I (^) Think of model comparison like shopping – is it worth spending more (parameters) in order to get a better (fitting) model?

Theory – Possible subsets

I (^) Let m denote any subset of pm distinct elements from { 1 ,... , p}. Remark: Typically the intercept is forced to be part of the model. I (^) Let M denote a set of linear regression models for the relationship between Y and X. Remark: Often M is reduced by preselection. Example I (^) There are 24 = 16 distinct subsets of { 1 , 2 , 3 , 4 }: ∅, { 1 }, { 2 }, { 1 , 2 }, { 3 },.. ., { 1 , 2 , 3 , 4 }. I (^) If the intercept is forced to be be part of the model, then there are 24 −^1 = 2k^ = 8 possible subsets.

Theory – Powerset

I (^) In the ‘worst’ case there are 2 k^ (or 2 p, p = k + 1 if intercept can be excluded) possible submodels in M if no preselection of models takes place. I (^) This is computationally/manually extensive. Example I (^) There are 2 13+1^ = 16, 384 possible regression models. I (^) Remark: A search scheme that considers only a quadratic growing number (in p) of models, i.e. p^2 = 14^2 = 196, would be almost 100 times faster than a full search!

Theory – Automated variable selection algorithms...

... or Bushwalking in M

Automated variable selection procedures ‘walk along’ the following ‘path’:

Choose a model to start with, e.g. I (^) the model with no covariates (null model), I (^) or the model with all covariates included (full model).
Test to see if there is an advantage in adding or removing covariates.
Repeat adding/removing variables until there is no advantage in changing the model. Such a strategy requires to visit a quadratic order of number of models!

Example – Cheese tasting data

I (^) Data on production of cheddar cheese from the LaTrobe Valley of Victoria. I (^) Taste of the final product is related to the concentration of several chemicals in the cheese. I (^) n = 30 samples of cheese were tasted by experts, and the following four variables recorded:

taste Tasters’ ratings Acetic Acetic acid in cheese H2S Hydrogen sulphide in cheese Lactic Lactic acid in the cheese.

Example – Cheese data: Backward selection

I (^) Of interest to the manufacturers to relate the cheese’s taste to the ‘chemical’ variables. I (^) Therefore construct multiple linear regression model of taste on other variables. I (^) Variable selection will allow us to produce a parsimonious model. I (^) Backwards variable selection starts with the full model (i.e. with all predictors). I (^) Let us have a look at the data first and then we will run a backward selection based on the F-test with the deletion of the least significant variable as long as pout > 5%.

Lecture 10: Model Selection and Comparison in Regression Analysis, Lecture notes of Statistics

Related documents

Partial preview of the text

Download Lecture 10: Model Selection and Comparison in Regression Analysis and more Lecture notes Statistics in PDF only on Docsity!

Lecture 10: Model Selection

Outline

Example – Two explanatory variables

Example: k ≥ 2 , set two parameter values to zero

Theory – Remarks on the F-statistic

Example – Catheter data

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 21.0084 8.7512 2.401 0.0399 *

H 0.1964 0.3606 0.545 0.

W 0.1908 0.1652 1.155 0.

Residual standard error: 3.943 on 9 degrees of freedom

Multiple R-squared: 0.8053,Adjusted R-squared: 0.

F-statistic: 18.62 on 2 and 9 DF, p-value: 0.

Example – Catheter data

Estimate Std. Error t value Pr(>|t|)

(Intercept) 36.208 2.333 15.52 7.97e-09 ***

---

Signif. codes: 0 ’’ 0.001 ’’ 0.01 ’’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 8.083 on 11 degrees of freedom

[1] 718.

[1] 11

Example – Catheter data

(S 1 − S 0 )/ 2

S 0 / 9

[1] 18.

[1] 0.

Remarks – General thoughts on choosing between models

Theory – Possible subsets

Theory – Powerset

Theory – Automated variable selection algorithms...

Example – Cheese tasting data

Example – Cheese data: Backward selection