



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The metropolis method for constructing markov chains to sample from a distribution and its application to volume estimation. The properties of markov chains, the metropolis filter, and volume estimation using a series of regions. The metropolis method is shown to converge to the target distribution and the volume of a convex shape is estimated using the ratios of adjacent regions.
Typology: Study notes
1 / 5
This page cannot be seen from the preview
Don't miss anything!




CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Metropolis method, volume estimation Date: 4/26/
The previous lecture discussed they some of the key concepts of Markov Chain Monte Carlo (MCMC) methods, including the stationary distribution π∗^ and the mixing time τǫ. This lec- ture introduces the Metropolis method for constructing Markov chains in order to sample from some distribution. The use of sampling methods for volume estimation is also introduced.
Recall from last time the key properties of a random walk Markov chain.
Also recall this important theorem concerning the existence and uniqueness of the stationary dis- tribution π∗.
Theorem 26.1.1 An aperiodic irreducible finite Markov chain is ergodic and has a unique station- ary distribution.
We can easily guarantee the aperiodicity of our chain by simply adding self-loops to all vertices. This will increase the mixing time by no more than a factor of 2.
But how do we actually construct a Markov chain with a stationary distribution equal to our target distribution? Also, we want this method to have a good (that is, small) mixing time. The Metropolis method allows us achieve these goals by defining our Markov chain as a random walk over a suitably defined graph.
We define the approach as follows. Say we which to sample values i ∈ Ω from a distribution Q(i). Then we define an undirected d-regular graph G on Ω, picking this graph in such a way that it has high conductance. Then from node v, pick the next node u uniformly from the d neighbors. Then:
First we examine the graph itself. Since it is fully connected and undirected, it is irreducible. Since all nodes have self-edges, it is aperiodic. Therefore this random walk is guaranteed to have a unique stationary distribution π∗. Now we must show that this stationary distribution is equal to our target distribution Q.
Claim 26.1.2 π∗^ = Q
Proof: Say that our initial π = Q, then take one step. Consider any node v, and calculate the probability of arriving at node v after this one step. If it is equal to Q(v), then we have shown that QP = Q, and therefore π∗^ = Q.
We need to calculate the probability of starting at distribution Q, taking one step, and then ending up in state v. This can be decomposed into three cases: we move from a neighbor u into v where Q(u) ≥ Q(v), we move from a neighbor u into v where Q(u) < Q(v), or we are already in v and we choose a neighbor u such that Q(u) < Q(v) but we end up staying at v. Let n be the number of neighbors u such that Q(u) ≥ Q(v).
Q′(v) =
u|(u,v)∈G, Q(u)≥Q(v)
d Q(u)
Q(v) Q(u)
u|(u,v)∈G, Q(u)<Q(v)
d Q(u) +
u|(u,v)∈G, Q(u)<Q(v)
d Q(v)(1 −
Q(u) Q(v)
n d Q(v) +
d − n d Q(u) +
d − n d Q(v) −
d − n d Q(u) (26.1.2) = Q(v) (26.1.3)
This shows that a random walk using the Metropolis method is guaranteed to converge to our target distribution Q. It is worth noting that our scheme of uniformly choosing a neighbor is a special case of the general Metropolis-Hastings sampler [3]. In the more general case, a proposal distribution is used to select the next candidate state conditioned on the current state. This proposal distribution need not be uniform over neighbors, and in fact need not even be symmetric.
An interesting application of sampling techniques is the problem of estimating the volume of a convex shape K ∈ Rn^ using an inclusion oracle which reveals whether a given point is contained in the shape or not. We are also given two balls, one completely enclosing K and one completely enclosed by K. Call these K ⊆ B(0, R) and K ⊇ B(0, r). This technique that we use has interesting parallels to the concept of self-reducibility.
What is the probability that a uniformly chosen point in the larger ball will be in K? We can use the smaller ball to bound this probability as ≥ (^) volvol((BB(0(0,R,r)))). However, for large n we will suffer the ’curse of dimensionality’ [4], and this lower bound will be very small, in particular ( (^) Rr )n.
Note that (1 + (^) n^1 )n^ log^ R/r^ K contains B(0, r), therefore e is indeed O(n log R/r).
To sample from Ki, we then simply sample uniformly from K and then re-scale. But how to sample from K itself? To approach this problem, we employ the MCMC methods we have been discussing.
We define our random walk, known as the Ball-walk, as follows. From any point u ∈ K, sample a point randomly from the ball centered at u with radius δ, B(u, δ), and move to the new point if it is inside K. If the point is outside K, stay at u.
Note that the graph defined by this rule allows us to reach any point from any other point, and also allows self-loops. Therefore it is irreducible and aperiodic, and must have a unique stationary dis- tribution π∗. The resulting Markov chain is time-reversible. Therefore, the stationary distribution is uniform.
For the practicality of this scheme, it is important to choose a good value for δ in order to get good samples from K. Taken to the extreme, a huge δ value would result in constantly picking points outside K, and therefore remaining at the current point. Likewise, a very small δ would result in taking very small steps, making it very slow to explore all of K. Also, if something is known about the geometry of K, it may be helpful to rescale the proposal ball to an ellipse, for example. This is accomplished by putting the body in an “isotropic” position via an affice transformation, so as to remove all sharp corners.
Figure 26.1.2: Rescaling the proposal ball to an ellipse based on the geometry of K.
The first approach based on this technique was polynomial in n, but with an unfortunate order O(n^23 ) [1]. Newer approaches, dubbed ’hit and run’, first choose a direction, and then sample uniformly from the line segment along that direction contained in K. This approach drastically improves mixing time, achieving O˜(n^4 ) [2].
Figure 26.1.3: The ’hit and run’ technique.
The inapproximability result is that one cannot estimate volume within a constant factor in Ω(n^2 ) time.
[1] Martin Dyer, Alan Frieze, Ravi Kannan. A random polynomial-time algorithm for approximat- ing the volume of convex bodies. JACM 1991.
[2] Laszlo Lovasz, Santosh Vempala. Simulated Annealing in Convex Bodies and an O(n^4 ) Volume Algorithm FOCS 2003.
[3] D. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
[4] Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.
[5] V. Vazirani. Approximation Algorithms. Springer, 2001.