































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of spatial statistics and data assimilation using a bayesian approach. It covers the basics of bayesian inference, bayes' theorem, point and interval estimation, hypothesis testing, and model choice. The document also introduces the concept of empirical bayes analysis and markov chain monte carlo (mcmc) integration methods. It further discusses various criteria for model choice, including bayes factor and bayesian information criterion (bic), and the use of dic for model comparison.
Typology: Study notes
1 / 39
This page cannot be seen from the preview
Don't miss anything!
































ST 790 M. Fall 2004.
Statistics Department NCSU
http://www.stat.ncsu.edu/
∼
fuentes
1
Bayes’ theorem
Bayesian inference
Point estimation
Interval estimation
Hypothesis testing and model choice
Bayes computation:
Revisit Gibbs and Metropolis-Hasting
Slice sampling
Convergence diagnosis
variance estimation
2
Inference on
θ
is based on its POSTERIOR distribution:
p ( θ | y, λ
p ( y, θ
λ
)
p ( y | λ ) =
p ( y, θ
λ )
p ( y, θ
λ ) dθ
= p ( y | θ ) π ( θ | λ )
∫ p ( y | θ ) π ( θ | λ )
dθ
Since
λ
might be unknown (hyperprior) we need an additional step:
p ( θ | y ) =
p ( y, θ
p ( y )
p ( y | θ ) π ( θ | λ ) h ( λ )
dλ
p ( y | θ ) π ( θ |
λ ) h ( λ )
dθdλ
4
Alternatively, we can replace
λ
by an estimated value of
λ , λˆ , which
could be the maximazer of
p ( y | λ ). Inference based on this estimated
posterior
p ( θ | y,
λˆ ) is referred to as EMPIRICAL BAYES analysis.
5
Estimates of Point estimation
θ :
The mean of the posterior:
θˆ
= E ( θ | y )
The median of the posterior:
θˆ
: ∫
θˆ
−∞
p ( θ | y )
dθ
7
The mode of the posterior:
θˆ
: p ( θˆ | y ) = sup
θ (^) p ( θˆ | y )
If the posterior exits under a flat priorintegration.The last one is the easiest to compute, since it does not required any
p ( θ ) = 1, then the posterior
mode is just the MLE of
θ.
represents better the center of a non-symmetric distribution). The posterior median is often the best point estimate (because it
8
TRUE parameter isinterval such that the probability that the random interval covers theThe frequentist CI does not satisfy that condition. Instead, it gives an
α
, i.e.
θ
∈
a, b
θ ) = 1
α.
The interval
p ( q L
< θ < q
U (^) ) = 1
α,
is the equal tail credible set.
shortest length among setsabout this mode. It will be also optimal, in the sense that it will haveFor symmetric unimodal posteriors, this interval will be symmetric
satisfying:
α
y ) =
∫ C p ( θ |
y ) dθ
10
shorter, interval can be obtained by taking only values ofFor posteriors that are not symmetric and unimodal, a better,
θ
that have
whileposterior greater than some cutoff. The cutoff is as large as possible
satisfies the previous condition.
length.confidence set. More difficult to compute but always of optimalThis is called the HIGHEST POSTERIOR DENSITY (HPD)
11
Thus, the marginals of
p ( y | M i
f (^) ( y | θ i , M
i ) π i ( θ i )
dθ
i
13
Bayes’ theorem can be applied to obtain the posterior of
1 | y )
and
2 | y ) = 1
M 1 | y ).
BF, the ratio of the posterior odds of The quantity used to summarize these results is BAYES FACTOR
1
to the prior odds of
1
:
1 | y ) / ( M 2 | y )
p ( M
1 ) /p
2 )
y | M
1 )
p ( y | M 2 )
If both models have same prior, then BF is the posterior odds of
1 .
14
nonhierarchical models and large sample sizesBIC also known as Schwarz Criterion. Schwarz showed that forBayesian Information Criterion (BIC)
n
, BIC approximates
2 log
BIC is a penalized likelihood ratio model choice criterion,
if we think of
2
as the ”full” model and
1
as the ”reduced” model.
= W − ( p 2 − p 1
) log
n
where
p i
is the number of parameters in model
i , and
2 log
sup^
M
1 (^) f (^) ( y | θ )
sup
M
2 (^) f (^) ( y | θ ) }
the usual likelihood ratio test statistic.
16
An alternative to BIC is the Akaike Information Criterion (AIC),
p 2 − p 1 ).
priors. If(BIC, AIC) are that they are not appropriate under noninformativeThe more serious limitation in using BF or their approximations This is also a penalized likelihood ration model choice criteria.
π i ( θ i ) is improper then
p ( y | M i
) is as well. A solution is to
use DIC.
17
The DIC is then defined as
p D
θ¯ )
could be marginal (only for the parameters of interest).to be ”in focus” before using DIC, i.e. the likelihood function usedIt is up to the user to think carefully about which parameters oughtusing different random number seeds.not been very successful. In practice, we recompute DIC few timesdifficult, delta method approximation to the variance of DIC haveIdentification of what is a SIGNIFICANT difference n DIC ismodels are meaningful.DIC has no absolute scale, only DIFFERENCES in DIC acrosssmaller DIC indicate a better-fitting model.
19
since they are recursive draws from a particular Markov chain, thealgorithms produce CORRELATED samples from this posteriors,inference. However, unlike traditional MC methods, MCMCA histogram based on such a sample is typically sufficient for reliabledistribution.closed form for the posterior by a SAMPLE of values from thisLike traditional Monte Carlo, MCMC works by producing not alower-dimensional problems.reducing the problem to one of RECURSIVELY solving a series ofto enable inference from posterior distributions of large problems, byMarkov chain Monte Carlo (MCMC) methods. Because their ability The most popular computing tools in Bayesian practice today are
20