Spatial Statistics and Data Assimilation: A Bayesian Approach, Study notes of Statistics

An overview of spatial statistics and data assimilation using a bayesian approach. It covers the basics of bayesian inference, bayes' theorem, point and interval estimation, hypothesis testing, and model choice. The document also introduces the concept of empirical bayes analysis and markov chain monte carlo (mcmc) integration methods. It further discusses various criteria for model choice, including bayes factor and bayesian information criterion (bic), and the use of dic for model comparison.

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-r4w
koofers-user-r4w 🇺🇸

8 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ST 790 M. Fall 2004.
Spatial Statistics and
Data Assimilation
Montserrat Fuentes
Statistics Department NCSU
http://www.stat.ncsu.edu/fuentes
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Spatial Statistics and Data Assimilation: A Bayesian Approach and more Study notes Statistics in PDF only on Docsity!

ST 790 M. Fall 2004.

Spatial Statistics and

Data Assimilation

Montserrat Fuentes

Statistics Department NCSU

[email protected]

http://www.stat.ncsu.edu/

fuentes

1

Basics of Bayesian inference

Bayes’ theorem

Bayesian inference

Point estimation

Interval estimation

Hypothesis testing and model choice

Bayes computation:

Revisit Gibbs and Metropolis-Hasting

Slice sampling

Convergence diagnosis

variance estimation

2

Inference on

θ

is based on its POSTERIOR distribution:

p ( θ | y, λ

p ( y, θ

λ

)

p ( y | λ ) =

p ( y, θ

λ )

p ( y, θ

λ ) dθ

= p ( y | θ ) π ( θ | λ )

∫ p ( y | θ ) π ( θ | λ )

Since

λ

might be unknown (hyperprior) we need an additional step:

p ( θ | y ) =

p ( y, θ

p ( y )

p ( y | θ ) π ( θ | λ ) h ( λ )

p ( y | θ ) π ( θ |

λ ) h ( λ )

dθdλ

4

Alternatively, we can replace

λ

by an estimated value of

λ , λˆ , which

could be the maximazer of

p ( y | λ ). Inference based on this estimated

posterior

p ( θ | y,

λˆ ) is referred to as EMPIRICAL BAYES analysis.

5

Bayesian inference

Estimates of Point estimation

θ :

The mean of the posterior:

θˆ

= E ( θ | y )

The median of the posterior:

θˆ

: ∫

θˆ

−∞

p ( θ | y )

7

The mode of the posterior:

θˆ

: p ( θˆ | y ) = sup

θ (^) p ( θˆ | y )

If the posterior exits under a flat priorintegration.The last one is the easiest to compute, since it does not required any

p ( θ ) = 1, then the posterior

mode is just the MLE of

θ.

represents better the center of a non-symmetric distribution). The posterior median is often the best point estimate (because it

8

TRUE parameter isinterval such that the probability that the random interval covers theThe frequentist CI does not satisfy that condition. Instead, it gives an

α

, i.e.

P

θ

a, b

θ ) = 1

α.

The interval

p ( q L

< θ < q

U (^) ) = 1

α,

is the equal tail credible set.

shortest length among setsabout this mode. It will be also optimal, in the sense that it will haveFor symmetric unimodal posteriors, this interval will be symmetric

C

satisfying:

α

P

C

y ) =

∫ C p ( θ |

y ) dθ

10

shorter, interval can be obtained by taking only values ofFor posteriors that are not symmetric and unimodal, a better,

θ

that have

whileposterior greater than some cutoff. The cutoff is as large as possible

C

satisfies the previous condition.

length.confidence set. More difficult to compute but always of optimalThis is called the HIGHEST POSTERIOR DENSITY (HPD)

11

Thus, the marginals of

Y

p ( y | M i

f (^) ( y | θ i , M

i ) π i ( θ i )

i

13

Bayes’ theorem can be applied to obtain the posterior of

P

M

1 | y )

and

P

M

2 | y ) = 1

P

M 1 | y ).

BF, the ratio of the posterior odds of The quantity used to summarize these results is BAYES FACTOR

M

1

to the prior odds of

M

1

:

BF

P

M

1 | y ) / ( M 2 | y )

p ( M

1 ) /p

M

2 )

P

y | M

1 )

p ( y | M 2 )

If both models have same prior, then BF is the posterior odds of

M

1 .

14

nonhierarchical models and large sample sizesBIC also known as Schwarz Criterion. Schwarz showed that forBayesian Information Criterion (BIC)

n

, BIC approximates

2 log

BF.

BIC is a penalized likelihood ratio model choice criterion,

if we think of

M

2

as the ”full” model and

M

1

as the ”reduced” model.

BIC

= W − ( p 2 − p 1

) log

n

where

p i

is the number of parameters in model

M

i , and

W

2 log

sup^

M

1 (^) f (^) ( y | θ )

sup

M

2 (^) f (^) ( y | θ ) }

the usual likelihood ratio test statistic.

16

An alternative to BIC is the Akaike Information Criterion (AIC),

AIC

W

p 2 − p 1 ).

priors. If(BIC, AIC) are that they are not appropriate under noninformativeThe more serious limitation in using BF or their approximations This is also a penalized likelihood ration model choice criteria.

π i ( θ i ) is improper then

p ( y | M i

) is as well. A solution is to

use DIC.

17

The DIC is then defined as

DIC

D

p D

D

D

θ¯ )

could be marginal (only for the parameters of interest).to be ”in focus” before using DIC, i.e. the likelihood function usedIt is up to the user to think carefully about which parameters oughtusing different random number seeds.not been very successful. In practice, we recompute DIC few timesdifficult, delta method approximation to the variance of DIC haveIdentification of what is a SIGNIFICANT difference n DIC ismodels are meaningful.DIC has no absolute scale, only DIFFERENCES in DIC acrosssmaller DIC indicate a better-fitting model.

19

Bayesian computation

since they are recursive draws from a particular Markov chain, thealgorithms produce CORRELATED samples from this posteriors,inference. However, unlike traditional MC methods, MCMCA histogram based on such a sample is typically sufficient for reliabledistribution.closed form for the posterior by a SAMPLE of values from thisLike traditional Monte Carlo, MCMC works by producing not alower-dimensional problems.reducing the problem to one of RECURSIVELY solving a series ofto enable inference from posterior distributions of large problems, byMarkov chain Monte Carlo (MCMC) methods. Because their ability The most popular computing tools in Bayesian practice today are

20