














































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The Metropolis-Hasting Algorithm, which is used to generate random samples from a probability distribution. It discusses the conditions required for a transition kernel, the candidate-generating densities, theoretical properties, pseudocode, rate of convergence, burn-in, and acceptance ratio. It also provides an example of the algorithm's application to a posterior distribution. suitable for students studying probability theory, statistics, and computational methods.
Typology: Exams
1 / 54
This page cannot be seen from the preview
Don't miss anything!















































1
Building our McMcOur previous chapter showed that we need to
find a transition Kernel
x, A
) such that:
2
Metropolis-Hastings Transition KernelLet:
(x, dy
pMH
(x, y
)^ dy
+^ r
MH
(x)
δx^
(dy
where:
pMH
(x, y
q^ (
x, y
)^ α^
(x, y
α^ (x, y
) = min
(f^
(y)
q^ (y, x
f^ (x
)^ q^ (
x, y
)
and
q^ (
x, y
) is a candidate-generating density that is irreducible and aperi-
odic.
4
Lemma R f^ A^
(y)
dy^
MH
(x, A
)^ f^
(x)
dx.
Proof: We only need to show that
f^ (x
)^ pMH
(x, y
f^ (
y)^ p
MH
(y, x
Assume without loss of generality that:
α^ (x, y
α^ (
y, x
Then:
f^ (x
)^ pMH
(x, y
f^ (
x)^ q
(x, y
)^ α^
(x, y
=^ f
(x)
q^ (x, y
) min
(f^
(y)
q^ (y, x
f^ (x
)^ q^ (
x, y
f^ (
x)^ p
(x, y
f^ ) (y)
q^ (y, x
f^ (x
)^ q^ (
x, y
=^ f
(y)
q^ (y, x
f^ (
y)^ p
MH
(y, x
5
Symmetric Candidate-Generating Densities^ •
We can take a candidate-generating density
q^ (
x, y
q^ (
x, y
) (for
example a Random walk). Then:
α^ (x, y
) = min
(f^
(y) f (x)
-^ Then, if the jump is “uphill” (
f^ (y
)^ /f
(x)
we always accept:
α^ (x, y
pMH
(x, y
q^ (
x, y
rMH
(x) = 0
-^ If the jump is “downhill” (
f^ (y
)^ /f
(x)
we accept with nonzero
probability:
α^ (x, y
pMH
(x, y
)^ < p
(x, y
rMH
(x)
7
Pseudo-Code1. Initialize the algorithm with an arbitrary value
x^0
and
j^ = 1.
∗ x j^ from
³ q xj−
, x 1
´∗ j and
u^ from
u^
≤^ α
³ xj
, x− 1
´∗ j then
xj
∗ x j , if
u >
³ α xj−
, x 1
´∗ j then
xj
xj−
j^ ≤
then
j^ + 1 and got to 3.
8
Choosing
q^ (
x, y
-^ A popular choice for
q^ (
x, y
) is a random walk: y^ =
x^ +
ε^ where
ε^ ∼
-^ It is known as a
Random-Walk M-H
-^ Random walk satis
fies all conditions of a good transition kernel.
-^ Good default option. •^ How do we determine
Σ? Hessian of distribution of interest.
10
Choosing
q^ (
x, y
-^ Another popular choice is the
Independent M-H.
-^ We just make
q^ (
x, y
g^ (
y)^.
-^ Note similarity with acceptance sampling. •^ However, the independent M-H accepts more often. •^ If
f^ (
x)^ ≤
ag
(x), then the independent M-H will accept at least 1
/a
of the proposals.
11
Example I (
t metropolis.m
-^ We revisit our problem of drawing from a
t^ distribution.
-^ Remember how di
fficult it was to use, for example, a normal distribu-
tion to sample as an envelope? • Basically, because dealing with tails with di
fficult.
-^ Now, we will see that with a Metropolis-Hastings the problem is quitesimple.
13
Example I^ •
Use MH to provide a numerical approximation to
t^ (x,
3), a
t^ distrib-
ution with 3 degrees of freedom, evaluated at
x.
-^ We need to get a random draw
n x
oNjj =^
from
t(3) using the MH.
-^ Implemented in my code
t metropolis.m
14
Output^ •
Output is a random draw
n x
oMjj
-^ With simulation, we can compute CDF:
t)^ '
MX i=
δ{x
:x<tii^
(x} )i
-^ Some times, researchers report a smoothed version of the density (forexample with a Kernel estimator). •^ Similarly, we can compute the integral of any function of interest andNumerical errors.
16
Rate of ConvergenceAt which speed does the Chain converge? How long the Chain should run?Three important things to do:^ •
Run a set of di
fferent Chains with di
fferent initial values and compare
within and between Chains variation. • Check serial correlation of the draws. • Make
an increasing function of the serial correlation of the draws.
-^ Run
diff
erent chains of length
with random initial values and
take the last value of each chain.
17
More on Convergence^ •
Often, convergence takes longer than what you think. • Case of bi-modal distributions. • Play it safe: just let the computer run a few more times. • Use acceleration methos or Rao-Blacwellization
var
(h^
varh
19
One Chain versus Many Chains^ •
Should we use one long chain or many di
fferent chains?
-^ The answer is clear: only one long chain. •^ Fortunately, the old approach of many short chains is disappearing. •^ This does not mean that you should not do many runs while you aretuning your software!
20