Model Selection: Association vs. Prediction in Biostatistics | Lab Reports Epidemiology

Biostat/Epi 536

October 21, 2008

Is each of the following important

for ASSOCIATION? for PREDICTION?

regression coefficient estimates Yes No

what variables are in the model Yes No*

the use of automated procedures No Yes

the area under the ROC curve No Yes

goodness of fit tests No Yes

* Once we’ve chosen the set of variables that we want to consider, we don’t care which of them are actually included in the final model.

However, the choice of this set of variables may depend on what data is available or what type of data we want to use in our prediction.

Model Selection for ASSOCIATION:

Confirmatory

[can be used to test hypotheses]

Exploratory

[can generate hypotheses, but cannot test them]

Main Association

Variable

- include whether or not significant

- include in form consistent with prior hypothesis

- include whether or not significant

- include in best-fitting form

Interactions with Main

Association Variable

- include only if specified in prior hypothesis

- include only in form consistent with prior hypothesis

- include only if specified in prior hypothesis

- explore functional forms and choose one that fits well

[but include main effects if interaction is included]

Adjustment Variables

(e.g. confounders,

precision variables)

- include only if specified in prior hypothesis

- include in as rich a form as possible to minimize

residual confounding

- statistical significance is irrelevant

- examine possible confounders to see if controlling for

them changes coefficient of interest

- examine different forms, and choose richer model when

there is a difference in the coefficient of interest**

Presentation - explanation of what was controlled, and how

- adjusted odds ratios, with CIs

- possibly unadjusted odds ratios, with CIs

- sometimes partially adjusted odds ratios

- explanation of what was controlled, and how

- adjusted odds ratios, with CIs

- possibly unadjusted odds ratios, with CIs

- sometimes partially adjusted odds ratios

- brief description of model selection and suggestions for

future studies of hypotheses that were generated

** F-tests should only be used to compare functional form (e.g. splines versus linear) if the splines are created with the main association variable. If they are created with an adjustment

variable, then the choice of form should be made based on the coefficient of interest only. If this coefficient is different with different forms of adjustment, choose the richer one.

Model Selection: Association vs. Prediction in Biostatistics, Lab Reports of Epidemiology

Related documents

Partial preview of the text

Download Model Selection: Association vs. Prediction in Biostatistics and more Lab Reports Epidemiology in PDF only on Docsity!

Biostat/Epi 536October 21, 2008

Model Selection for ASSOCIATION:

-^

-^

-^

-^

-^

-^

-^

-^

Biostat/Epi 536

October 21, 2008

Model Selection for PREDICTION:

1. Decide how you’re going to validate your model. If you decide to split your data into

two samples (one for model-building and one for validation), split it and don’t look at the

validation data again until after you’ve chosen a model.

2. Choose the set of variables that you’ll consider for inclusion in your model. Decide

whether or not you want to look at interactions and higher-order terms.

3. Determine what procedure(s) you’ll use to choose from among these variables.

4. Find the best model using your chosen procedure(s).

5. Plot an ROC curve, compute the area under it, and calculate goodness of fit tests, using a

validation dataset or resampling methods.

6. Present your data as discussed in class.