Download Posterior Predictive Checking: Assessing Model Fit and Substantive Knowledge - Prof. Mary and more Study notes Statistics in PDF only on Docsity!
22S:
Posterior predictive checking
Lecture 22 Nov. 26, 2007
Kate Cowles 374 SH, 335- [email protected]
Model checking and sensitivity analy- sis
- goal: assess fit of model to
- data
- our substantive knowledg
- must check effects of
- prior
- likelihood specification
- hierarchical structure
- any other application-specific issues ∗ e.g. which predictor variables
3
- theoretically possible to set up and fit a “su- per model” including all possibly true models - but computationally infeasible - and really conceptually impossible
- instead we fit a feasible number of models and examine the posterior distributions that result - cast models as broadly as possible - fail to fit reality? - sensitive to arbitrary specifications?
4 Principles and methods of model-checking
- “do the model’s deficiencies have a noticeable effect on substantive inferences?”
- how to judge when assumptions of conve- nience can be made safely
Using the posterior distribution to check a statistical model
- compare posterior distribution of parameters to - substantive knowledge - other data
- compare posterior predictive distribution of future observations to substantive knowledge - e.g.: compare election predictions from a model to substantive knowledge
- compare posterior predictive distribtuion of future observations to the data that have ac- tually occurred
Using the posterior predictive distribu- tion to check a statistical model
- recall:
- posterior: conditional on observed data y
- predictive: prediction of an observable but unobserved y
- p(˜y|y) =
∫ p(˜y, θ|y)dθ ∫ p(˜y|θ, y)p(θ|y)dθ ∫ p(˜y|θ)p(θ|y)dθ
- last line holds if new data are condition- ally independent of old data given model parameters
7
Checking a model by comparing the data that we have to the posterior pre- dictive distribution
- enables checking fit of model without any more substantive knowledge than is in ex- isting data and model
- do datasets simulated from the model we fit “look like” the real data in ways relevant to our inference?
- requires drawing “replicated data”
8 Procedure to draw a “replicated dataset” from posterior predictive distribution
- notation
- y: observed data
- yrep: a complete simulated dataset ∗ same number of observations as in y ∗ same values of explanatory variables (if any) ∗ response variables simulated from pos- terior predictive distribution
- θ: vector of all unknown model parame- ters, including parameters of upper stage priors if model is hierarchical
- Fit model to the 66 observations
yi ∼ N (μ, σ^2 ), i = 1,... , 66 p(μ, σ^2 ) ∝
σ^2
- generated 20 replicate datasets
- found that in all replicate datasets, min(yrepi ) was much larger than min(yi) in real data
Interpreting and using posterior pre- dictive p-values
- not Pr(model is true | data)
- posterior probability that T (yrep, θ) ≥ T (y, θ)x)
- ideal is if posterior predictive p-value is some- where around. - would mean that real data y is typical of data that comes from the model
- model is suspect if tail-area probability of meaningful test quantity is close to either 0 or 1 - would mean that aspect of data being mea- sured by test quantity is inconsistent with model - extreme ppp-value indicates that model needs to be changed or expanded ∗ in Newcomb example, use t or contam- inated normal likelihood