Posterior Predictive Checking: Assessing Model Fit and Substantive Knowledge - Prof. Mary , Study notes of Statistics

The principles and methods of posterior predictive checking, a statistical technique used to assess the fit of a statistical model to data and substantive knowledge. The goals of model checking, the use of the posterior distribution to check a model, and procedures for drawing replicated datasets from the posterior predictive distribution. Discrepancy measures and test quantities are also discussed for evaluating the results of posterior predictive checks.

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-lry
koofers-user-lry 🇺🇸

5

(1)

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
22S:138
Posterior predictive checking
Lecture 22
Nov. 26, 2007
Kate Cowles
374 SH, 335-0727
2
Model checking and sensitivity analy-
sis
goal: assess fit of model to
data
our substantive knowledg
must check effects of
prior
likelihood specification
hierarchical structure
any other application-specific issues
e.g. which predictor variables
3
theoretically possible to set up and fit a “su-
per model” including all possibly true models
but computationally infeasible
and really conceptually impossible
instead we fit a feasible number of models
and examine the posterior distributions that
result
cast models as broadly as possible
fail to fit reality?
sensitive to arbitrary specifications?
4
Principles and methods of model-checking
“do the model’s deficiencies have a noticeable
effect on substantive inferences?”
how to judge when assumptions of conve-
nience can be made safely
pf3
pf4

Partial preview of the text

Download Posterior Predictive Checking: Assessing Model Fit and Substantive Knowledge - Prof. Mary and more Study notes Statistics in PDF only on Docsity!

22S:

Posterior predictive checking

Lecture 22 Nov. 26, 2007

Kate Cowles 374 SH, 335- [email protected]

Model checking and sensitivity analy- sis

  • goal: assess fit of model to
    • data
    • our substantive knowledg
  • must check effects of
    • prior
    • likelihood specification
    • hierarchical structure
    • any other application-specific issues ∗ e.g. which predictor variables

3

  • theoretically possible to set up and fit a “su- per model” including all possibly true models - but computationally infeasible - and really conceptually impossible
  • instead we fit a feasible number of models and examine the posterior distributions that result - cast models as broadly as possible - fail to fit reality? - sensitive to arbitrary specifications?

4 Principles and methods of model-checking

  • “do the model’s deficiencies have a noticeable effect on substantive inferences?”
  • how to judge when assumptions of conve- nience can be made safely

Using the posterior distribution to check a statistical model

  • compare posterior distribution of parameters to - substantive knowledge - other data
  • compare posterior predictive distribution of future observations to substantive knowledge - e.g.: compare election predictions from a model to substantive knowledge
  • compare posterior predictive distribtuion of future observations to the data that have ac- tually occurred

Using the posterior predictive distribu- tion to check a statistical model

  • recall:
    • posterior: conditional on observed data y
    • predictive: prediction of an observable but unobserved y
    • p(˜y|y) =

∫ p(˜y, θ|y)dθ ∫ p(˜y|θ, y)p(θ|y)dθ ∫ p(˜y|θ)p(θ|y)dθ

  • last line holds if new data are condition- ally independent of old data given model parameters

7

Checking a model by comparing the data that we have to the posterior pre- dictive distribution

  • enables checking fit of model without any more substantive knowledge than is in ex- isting data and model
  • do datasets simulated from the model we fit “look like” the real data in ways relevant to our inference?
  • requires drawing “replicated data”

8 Procedure to draw a “replicated dataset” from posterior predictive distribution

  • notation
    • y: observed data
    • yrep: a complete simulated dataset ∗ same number of observations as in y ∗ same values of explanatory variables (if any) ∗ response variables simulated from pos- terior predictive distribution
    • θ: vector of all unknown model parame- ters, including parameters of upper stage priors if model is hierarchical
  • Fit model to the 66 observations

yi ∼ N (μ, σ^2 ), i = 1,... , 66 p(μ, σ^2 ) ∝

σ^2

  • generated 20 replicate datasets
  • found that in all replicate datasets, min(yrepi ) was much larger than min(yi) in real data

Interpreting and using posterior pre- dictive p-values

  • not Pr(model is true | data)
  • posterior probability that T (yrep, θ) ≥ T (y, θ)x)
  • ideal is if posterior predictive p-value is some- where around. - would mean that real data y is typical of data that comes from the model
  • model is suspect if tail-area probability of meaningful test quantity is close to either 0 or 1 - would mean that aspect of data being mea- sured by test quantity is inconsistent with model - extreme ppp-value indicates that model needs to be changed or expanded ∗ in Newcomb example, use t or contam- inated normal likelihood