Metropolis Method and Volume Estimation: Approximations Algorithms CS880, Study notes of Approximation Algorithms

The metropolis method for constructing markov chains to sample from a distribution and its application to volume estimation. The properties of markov chains, the metropolis filter, and volume estimation using a series of regions. The metropolis method is shown to converge to the target distribution and the volume of a convex shape is estimated using the ratios of adjacent regions.

Typology: Study notes

2011/2012

Uploaded on 02/15/2012

alexey
alexey 🇺🇸

4.7

(20)

325 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS880: Approximations Algorithms
Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla
Topic: Metropolis method, volume estimation Date: 4/26/07
The previous lecture discussed they some of the key concepts of Markov Chain Monte Carlo
(MCMC) methods, including the stationary distribution πand the mixing time τǫ. This lec-
ture introduces the Metropolis method for constructing Markov chains in order to sample from
some distribution. The use of sampling methods for volume estimation is also introduced.
26.1 Metropolis method
26.1.1 MCMC review
Recall from last time the key properties of a random walk Markov chain.
= the state space
n=||
P= the transition matrix, Pij =P r[ij]
π= the stationary distribution such that πP=π
τǫ= the mixing time, after which the 1-norm of the difference between the chain distribution
and the stationary distribution is guaranteed to be < ǫ.
Also recall this important theorem concerning the existence and uniqueness of the stationary dis-
tribution π.
Theorem 26.1.1 An aperiodic irreducible finite Markov chain is ergodic and has a unique station-
ary distribution.
We can easily guarantee the aperiodicity of our chain by simply adding self-loops to all vertices.
This will increase the mixing time by no more than a factor of 2.
26.1.2 Metropolis filter
But how do we actually construct a Markov chain with a stationary distribution equal to our
target distribution? Also, we want this method to have a good (that is, small) mixing time. The
Metropolis method allows us achieve these goals by defining our Markov chain as a random walk
over a suitably defined graph.
We define the approach as follows. Say we which to sample values i from a distribution Q(i).
Then we define an undirected d-regular graph Gon Ω, picking this graph in such a way that it has
high conductance. Then from node v, pick the next node uuniformly from the dneighbors. Then:
1
pf3
pf4
pf5

Partial preview of the text

Download Metropolis Method and Volume Estimation: Approximations Algorithms CS880 and more Study notes Approximation Algorithms in PDF only on Docsity!

CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Metropolis method, volume estimation Date: 4/26/

The previous lecture discussed they some of the key concepts of Markov Chain Monte Carlo (MCMC) methods, including the stationary distribution π∗^ and the mixing time τǫ. This lec- ture introduces the Metropolis method for constructing Markov chains in order to sample from some distribution. The use of sampling methods for volume estimation is also introduced.

26.1 Metropolis method

26.1.1 MCMC review

Recall from last time the key properties of a random walk Markov chain.

  • Ω = the state space
  • n = |Ω|
  • P = the transition matrix, Pij = P r[i → j]
  • π∗^ = the stationary distribution such that π∗P = π∗
  • τǫ = the mixing time, after which the ℓ 1 -norm of the difference between the chain distribution and the stationary distribution is guaranteed to be < ǫ.

Also recall this important theorem concerning the existence and uniqueness of the stationary dis- tribution π∗.

Theorem 26.1.1 An aperiodic irreducible finite Markov chain is ergodic and has a unique station- ary distribution.

We can easily guarantee the aperiodicity of our chain by simply adding self-loops to all vertices. This will increase the mixing time by no more than a factor of 2.

26.1.2 Metropolis filter

But how do we actually construct a Markov chain with a stationary distribution equal to our target distribution? Also, we want this method to have a good (that is, small) mixing time. The Metropolis method allows us achieve these goals by defining our Markov chain as a random walk over a suitably defined graph.

We define the approach as follows. Say we which to sample values i ∈ Ω from a distribution Q(i). Then we define an undirected d-regular graph G on Ω, picking this graph in such a way that it has high conductance. Then from node v, pick the next node u uniformly from the d neighbors. Then:

  • If Q(u) ≥ Q(v), move to node u
  • Else move to node u with probability Q Q((uv)) , stay with probability (1 − Q Q((uv)) ).

First we examine the graph itself. Since it is fully connected and undirected, it is irreducible. Since all nodes have self-edges, it is aperiodic. Therefore this random walk is guaranteed to have a unique stationary distribution π∗. Now we must show that this stationary distribution is equal to our target distribution Q.

Claim 26.1.2 π∗^ = Q

Proof: Say that our initial π = Q, then take one step. Consider any node v, and calculate the probability of arriving at node v after this one step. If it is equal to Q(v), then we have shown that QP = Q, and therefore π∗^ = Q.

We need to calculate the probability of starting at distribution Q, taking one step, and then ending up in state v. This can be decomposed into three cases: we move from a neighbor u into v where Q(u) ≥ Q(v), we move from a neighbor u into v where Q(u) < Q(v), or we are already in v and we choose a neighbor u such that Q(u) < Q(v) but we end up staying at v. Let n be the number of neighbors u such that Q(u) ≥ Q(v).

Q′(v) =

u|(u,v)∈G, Q(u)≥Q(v)

d Q(u)

Q(v) Q(u)

u|(u,v)∈G, Q(u)<Q(v)

d Q(u) +

u|(u,v)∈G, Q(u)<Q(v)

d Q(v)(1 −

Q(u) Q(v)

n d Q(v) +

d − n d Q(u) +

d − n d Q(v) −

d − n d Q(u) (26.1.2) = Q(v) (26.1.3)

This shows that a random walk using the Metropolis method is guaranteed to converge to our target distribution Q. It is worth noting that our scheme of uniformly choosing a neighbor is a special case of the general Metropolis-Hastings sampler [3]. In the more general case, a proposal distribution is used to select the next candidate state conditioned on the current state. This proposal distribution need not be uniform over neighbors, and in fact need not even be symmetric.

26.1.3 Volume estimation

An interesting application of sampling techniques is the problem of estimating the volume of a convex shape K ∈ Rn^ using an inclusion oracle which reveals whether a given point is contained in the shape or not. We are also given two balls, one completely enclosing K and one completely enclosed by K. Call these K ⊆ B(0, R) and K ⊇ B(0, r). This technique that we use has interesting parallels to the concept of self-reducibility.

What is the probability that a uniformly chosen point in the larger ball will be in K? We can use the smaller ball to bound this probability as ≥ (^) volvol((BB(0(0,R,r)))). However, for large n we will suffer the ’curse of dimensionality’ [4], and this lower bound will be very small, in particular ( (^) Rr )n.

Note that (1 + (^) n^1 )n^ log^ R/r^ K contains B(0, r), therefore e is indeed O(n log R/r).

To sample from Ki, we then simply sample uniformly from K and then re-scale. But how to sample from K itself? To approach this problem, we employ the MCMC methods we have been discussing.

We define our random walk, known as the Ball-walk, as follows. From any point u ∈ K, sample a point randomly from the ball centered at u with radius δ, B(u, δ), and move to the new point if it is inside K. If the point is outside K, stay at u.

Note that the graph defined by this rule allows us to reach any point from any other point, and also allows self-loops. Therefore it is irreducible and aperiodic, and must have a unique stationary dis- tribution π∗. The resulting Markov chain is time-reversible. Therefore, the stationary distribution is uniform.

For the practicality of this scheme, it is important to choose a good value for δ in order to get good samples from K. Taken to the extreme, a huge δ value would result in constantly picking points outside K, and therefore remaining at the current point. Likewise, a very small δ would result in taking very small steps, making it very slow to explore all of K. Also, if something is known about the geometry of K, it may be helpful to rescale the proposal ball to an ellipse, for example. This is accomplished by putting the body in an “isotropic” position via an affice transformation, so as to remove all sharp corners.

Figure 26.1.2: Rescaling the proposal ball to an ellipse based on the geometry of K.

The first approach based on this technique was polynomial in n, but with an unfortunate order O(n^23 ) [1]. Newer approaches, dubbed ’hit and run’, first choose a direction, and then sample uniformly from the line segment along that direction contained in K. This approach drastically improves mixing time, achieving O˜(n^4 ) [2].

Figure 26.1.3: The ’hit and run’ technique.

The inapproximability result is that one cannot estimate volume within a constant factor in Ω(n^2 ) time.

References

[1] Martin Dyer, Alan Frieze, Ravi Kannan. A random polynomial-time algorithm for approximat- ing the volume of convex bodies. JACM 1991.

[2] Laszlo Lovasz, Santosh Vempala. Simulated Annealing in Convex Bodies and an O(n^4 ) Volume Algorithm FOCS 2003.

[3] D. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

[4] Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.

[5] V. Vazirani. Approximation Algorithms. Springer, 2001.