Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Metropolis Method and Volume Estimation: Approximations Algorithms CS880, Study notes of Approximation Algorithms

University of Wisconsin (UW) - Madison Approximation Algorithms

The metropolis method for constructing markov chains to sample from a distribution and its application to volume estimation. The properties of markov chains, the metropolis filter, and volume estimation using a series of regions. The metropolis method is shown to converge to the target distribution and the volume of a convex shape is estimated using the ratios of adjacent regions.

Typology: Study notes

2011/2012

Uploaded on 02/15/2012

alexey 🇺🇸

4.7

(20)

325 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

CS880: Approximations Algorithms

Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla

Topic: Metropolis method, volume estimation Date: 4/26/07

The previous lecture discussed they some of the key concepts of Markov Chain Monte Carlo

(MCMC) methods, including the stationary distribution π∗and the mixing time τǫ. This lec-

ture introduces the Metropolis method for constructing Markov chains in order to sample from

some distribution. The use of sampling methods for volume estimation is also introduced.

26.1 Metropolis method

26.1.1 MCMC review

Recall from last time the key properties of a random walk Markov chain.

•Ω = the state space

•n=|Ω|

•P= the transition matrix, Pij =P r[i→j]

•π∗= the stationary distribution such that π∗P=π∗

•τǫ= the mixing time, after which the ℓ1-norm of the difference between the chain distribution

and the stationary distribution is guaranteed to be < ǫ.

Also recall this important theorem concerning the existence and uniqueness of the stationary dis-

tribution π∗.

Theorem 26.1.1 An aperiodic irreducible finite Markov chain is ergodic and has a unique station-

ary distribution.

We can easily guarantee the aperiodicity of our chain by simply adding self-loops to all vertices.

This will increase the mixing time by no more than a factor of 2.

26.1.2 Metropolis filter

But how do we actually construct a Markov chain with a stationary distribution equal to our

target distribution? Also, we want this method to have a good (that is, small) mixing time. The

Metropolis method allows us achieve these goals by defining our Markov chain as a random walk

over a suitably defined graph.

We define the approach as follows. Say we which to sample values i∈Ω from a distribution Q(i).

Then we define an undirected d-regular graph Gon Ω, picking this graph in such a way that it has

high conductance. Then from node v, pick the next node uuniformly from the dneighbors. Then:

1

Discover Study notes of Approximation Algorithms University of Wisconsin (UW) - Madison

Partial preview of the text

Download Metropolis Method and Volume Estimation: Approximations Algorithms CS880 and more Study notes Approximation Algorithms in PDF only on Docsity!

CS880: Approximations Algorithms Scribe: Dave Andrzejewski Lecturer: Shuchi Chawla Topic: Metropolis method, volume estimation Date: 4/26/

The previous lecture discussed they some of the key concepts of Markov Chain Monte Carlo (MCMC) methods, including the stationary distribution π∗^ and the mixing time τǫ. This lec- ture introduces the Metropolis method for constructing Markov chains in order to sample from some distribution. The use of sampling methods for volume estimation is also introduced.

26.1 Metropolis method

26.1.1 MCMC review

Recall from last time the key properties of a random walk Markov chain.

Ω = the state space
n = |Ω|
P = the transition matrix, Pij = P r[i → j]
π∗^ = the stationary distribution such that π∗P = π∗
τǫ = the mixing time, after which the ℓ 1 -norm of the difference between the chain distribution and the stationary distribution is guaranteed to be < ǫ.

Also recall this important theorem concerning the existence and uniqueness of the stationary dis- tribution π∗.

Theorem 26.1.1 An aperiodic irreducible finite Markov chain is ergodic and has a unique station- ary distribution.

We can easily guarantee the aperiodicity of our chain by simply adding self-loops to all vertices. This will increase the mixing time by no more than a factor of 2.

26.1.2 Metropolis filter

But how do we actually construct a Markov chain with a stationary distribution equal to our target distribution? Also, we want this method to have a good (that is, small) mixing time. The Metropolis method allows us achieve these goals by defining our Markov chain as a random walk over a suitably defined graph.

We define the approach as follows. Say we which to sample values i ∈ Ω from a distribution Q(i). Then we define an undirected d-regular graph G on Ω, picking this graph in such a way that it has high conductance. Then from node v, pick the next node u uniformly from the d neighbors. Then:

If Q(u) ≥ Q(v), move to node u
Else move to node u with probability Q Q((uv)) , stay with probability (1 − Q Q((uv)) ).

First we examine the graph itself. Since it is fully connected and undirected, it is irreducible. Since all nodes have self-edges, it is aperiodic. Therefore this random walk is guaranteed to have a unique stationary distribution π∗. Now we must show that this stationary distribution is equal to our target distribution Q.

Claim 26.1.2 π∗^ = Q

Proof: Say that our initial π = Q, then take one step. Consider any node v, and calculate the probability of arriving at node v after this one step. If it is equal to Q(v), then we have shown that QP = Q, and therefore π∗^ = Q.

We need to calculate the probability of starting at distribution Q, taking one step, and then ending up in state v. This can be decomposed into three cases: we move from a neighbor u into v where Q(u) ≥ Q(v), we move from a neighbor u into v where Q(u) < Q(v), or we are already in v and we choose a neighbor u such that Q(u) < Q(v) but we end up staying at v. Let n be the number of neighbors u such that Q(u) ≥ Q(v).

Q′(v) =

u|(u,v)∈G, Q(u)≥Q(v)

d Q(u)

Q(v) Q(u)

u|(u,v)∈G, Q(u)<Q(v)

d Q(u) +

u|(u,v)∈G, Q(u)<Q(v)

d Q(v)(1 −

Q(u) Q(v)

n d Q(v) +

d − n d Q(u) +

d − n d Q(v) −

d − n d Q(u) (26.1.2) = Q(v) (26.1.3)

This shows that a random walk using the Metropolis method is guaranteed to converge to our target distribution Q. It is worth noting that our scheme of uniformly choosing a neighbor is a special case of the general Metropolis-Hastings sampler [3]. In the more general case, a proposal distribution is used to select the next candidate state conditioned on the current state. This proposal distribution need not be uniform over neighbors, and in fact need not even be symmetric.

26.1.3 Volume estimation

An interesting application of sampling techniques is the problem of estimating the volume of a convex shape K ∈ Rn^ using an inclusion oracle which reveals whether a given point is contained in the shape or not. We are also given two balls, one completely enclosing K and one completely enclosed by K. Call these K ⊆ B(0, R) and K ⊇ B(0, r). This technique that we use has interesting parallels to the concept of self-reducibility.

What is the probability that a uniformly chosen point in the larger ball will be in K? We can use the smaller ball to bound this probability as ≥ (^) volvol((BB(0(0,R,r)))). However, for large n we will suffer the ’curse of dimensionality’ [4], and this lower bound will be very small, in particular ( (^) Rr )n.

Note that (1 + (^) n^1 )n^ log^ R/r^ K contains B(0, r), therefore e is indeed O(n log R/r).

To sample from Ki, we then simply sample uniformly from K and then re-scale. But how to sample from K itself? To approach this problem, we employ the MCMC methods we have been discussing.

We define our random walk, known as the Ball-walk, as follows. From any point u ∈ K, sample a point randomly from the ball centered at u with radius δ, B(u, δ), and move to the new point if it is inside K. If the point is outside K, stay at u.

Note that the graph defined by this rule allows us to reach any point from any other point, and also allows self-loops. Therefore it is irreducible and aperiodic, and must have a unique stationary dis- tribution π∗. The resulting Markov chain is time-reversible. Therefore, the stationary distribution is uniform.

For the practicality of this scheme, it is important to choose a good value for δ in order to get good samples from K. Taken to the extreme, a huge δ value would result in constantly picking points outside K, and therefore remaining at the current point. Likewise, a very small δ would result in taking very small steps, making it very slow to explore all of K. Also, if something is known about the geometry of K, it may be helpful to rescale the proposal ball to an ellipse, for example. This is accomplished by putting the body in an “isotropic” position via an affice transformation, so as to remove all sharp corners.

Figure 26.1.2: Rescaling the proposal ball to an ellipse based on the geometry of K.

The first approach based on this technique was polynomial in n, but with an unfortunate order O(n^23 ) [1]. Newer approaches, dubbed ’hit and run’, first choose a direction, and then sample uniformly from the line segment along that direction contained in K. This approach drastically improves mixing time, achieving O˜(n^4 ) [2].

Figure 26.1.3: The ’hit and run’ technique.

The inapproximability result is that one cannot estimate volume within a constant factor in Ω(n^2 ) time.

References

[1] Martin Dyer, Alan Frieze, Ravi Kannan. A random polynomial-time algorithm for approximat- ing the volume of convex bodies. JACM 1991.

[2] Laszlo Lovasz, Santosh Vempala. Simulated Annealing in Convex Bodies and an O(n^4 ) Volume Algorithm FOCS 2003.

[3] D. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

[4] Trevor Hastie, Robert Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, 2001.

[5] V. Vazirani. Approximation Algorithms. Springer, 2001.

Metropolis Method and Volume Estimation: Approximations Algorithms CS880, Study notes of Approximation Algorithms

Related documents

Partial preview of the text

Download Metropolis Method and Volume Estimation: Approximations Algorithms CS880 and more Study notes Approximation Algorithms in PDF only on Docsity!

26.1 Metropolis method

26.1.1 MCMC review

26.1.2 Metropolis filter

26.1.3 Volume estimation

References