Model Selection: Akaike Information Criterion (AIC) for Best Regression Model - Prof. Bria, Exams of Statistics

The importance of model selection in regression analysis and introduces the akaike information criterion (aic) as a method for determining the best model. The concept of kullback-leibler discrepancy and how it relates to aic. It also provides a step-by-step procedure for model selection using aic in sas.

Typology: Exams

Pre 2010

Uploaded on 08/19/2009

koofers-user-qyt
koofers-user-qyt 🇺🇸

8 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Model selection
\\ \
"# 5
, , ..., : which ones should be included?
Best R-squared? Include them all (but forget about
prediction)
Highest likelihood? Remember, that is just like R-squared.
“Stepwise” regression: various schemes
Build up: find best one-variable model (according
to hypothesis test), then best two-variable model
with that first variable included, etc., until no
additional variables are “significant”
Break down: start with all variables, remove the one
that is least significant, then the next, etc., until all
remaining variables are significant
Stepwise schemes do not necessarily find the best model
among all the possible variable combinations
pf3
pf4
pf5

Partial preview of the text

Download Model Selection: Akaike Information Criterion (AIC) for Best Regression Model - Prof. Bria and more Exams Statistics in PDF only on Docsity!

Model selection

\ (^) " , \ (^) # , ..., \ 5 : which ones should be included?

Best R-squared? Include them all (but forget about prediction)

Highest likelihood? Remember, that is just like R-squared.

ìStepwiseî regression: various schemes

Build up: find best one-variable model (according to hypothesis test), then best two-variable model with that first variable included, etc., until no additional variables are ìsignificantî

Break down: start with all variables, remove the one that is least significant, then the next, etc., until all remaining variables are significant

Stepwise schemes do not necessarily find the best model among all the possible variable combinations

Professor Hirotugu Akaike's approach

Akaike (1973) and subsequent papers (see also recent book by K. P. Burnham and D. R. Anderson: Model selection and inference: a practical information-theoretic approach. Springer)

0 Ca b : true model (unknown) giving rise to data C (C is a vector of data)

1 Ca ; )b : candidate model (parameter vector ))

Want to find a model 1 Ca ; )b that is ìclose toî0 Ca b

Kullback-Leibler discrepancy:

O 0 1 œ

0 ]

1 ]

a b (^) ” Œ &•

a b a b

, E log (^0) ;)

This is a measure of how ìfarî model 1 is from model 0 (with reference to model 0 ). Properties:

O 0a , 1 b 0

O 0a , 1 bœ 0 Í 0 œ 1

Of course, we can never know how far our model 1 is from

  1. But Akaike showed that we might be able to estimate something almost as good.

The Akaike Information Criterion (AIC) for model 1 :

AIC œ. 2 log (^) Š Ps^1 ‹ 3 2 ;

Model selection procedure:

ï Decide, on scientific grounds, upon a small suite of models to be compared (not a fishing expedition), that is, narrow down the number of variables to be considered for inclusion in a multiple regression

ï Fit all possible models in the suite (ML estimates of parameters in all the models under consideration)

ï For each model, calculate its AIC (remember that the number of parameters in a multiple regression is 5 32)

ï Pick the model with the smallest AIC. That is the model in the suite with the best overall statistical properties and parameter balance

Akaike's rule of thumb: two models are essentially indistinguishable if the difference of their AICs is less than 2.

AIC/multiple regression in SAS

PROC RSQUARE fits all possible regression models!!!

DATA; INPUT Y X1 X2 X3 X4 X5; CARDS; ã ; PROC RSQUARE; MODEL Y=X1 X2 X3 X4 X5 / AIC;

Occam's razor

ìQuia frustra fit per plura quod potest fieri per paucioraî

(Because it is vain to do with more what can be done with less)

ó William of Occam

Brian's bludgeon

ìNo amount of computing is too much, if it gets the job doneî

ó Brian of Idaho