

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The problem of computing expectations of functions with respect to a probabilistic model with unknowns in the context of bayesian inference. It covers various methods for approximating these integrals, including summation, uniform sampling, importance sampling, and rejection sampling. The text focuses on the case where the variable of interest, denoted as θ, is univariate and bounded, but also touches upon the challenges of extending these methods to higher dimensions and non-discrete variables.
Typology: Study notes
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Machine Learning (CS 5350/CS 6350) 03 Apr 2007
The general problem we face in Bayesian inference is to compute the expectation of some function with respect to a probabilistic model with unknowns.
In the simplest case, we want the expectation of a single variable. In more complex cases, we want an expectation of a complex function of all variables.
Suppose p(θ) is our distribution of interest (some θ will be known, some not). We want:
Z = Eθ∼p
f (θ)
dθp(θ)f (θ)
For some cases this integral will be available in closed form (eg., HMMs). For many (most?) cases, however, it will not.
Lets say that θ is discrete, univariate. Then we can compute the expectation by just summing over all possible values. Obviously, though, this won’t scale well to high-dimensional or non-discrete variables. But let’s see what happens if we try...
Integration by Summation
Suppose θ is univariate, bounded continuous. Wlog, θ ∈ [0, 1]. If we remember how we first learned integration, we can break [0, 1] into R equally-sized rectangles. Then, we have:
i=
p (i/R) f (i/R)
As R → ∞, Z becomes increasingly more accurate.
One way of thinking about this is that we have a set S containing R-many equally spaced points, and the integral is approximated by:
θ∈S
p(θ)f (θ)
Unfortunately, if θ is D-dimensional, then we need to sum RD^ values of θ.
Uniform Sampling
Instead of spacing θ ∈ S evenly, let’s space them randomly. This is the idea of “Monte Carlo” integration, which essentially means “randomized” integration. Uniform sampling is the simplest case. Let S be a random sampling of θs. Then, we still have:
θ∈S
p(θ)f (θ)
Machine Learning (CS 5350/CS 6350) 2
This scales better computationally, but still the number of samples required to guarantee that we get a close approximation is huge.
It’s worth thinking about how hard this problem is. Think of a boat on a lake. We want to estimate the volume of the lake, but cannot see the bottom. We can drive the boat to any position in the lake and drop an anchor, thereby measuring the depth there. How can we approximate the volume? Uniform sampling says to drive randomly around the lake, dropping at the flip of a coin. But there are many chases in which we can do better.
Importance Sampling
Here we use prior knowledge in the form of a helper distribution q that we expect to be “similar” to p and from which we can sample. It must have the same “support” as p (i.e., not zero too often). Then, we compute:
Z = Eθ∼p[f (θ)]
=
dθp(θ)f (θ)
dθq(θ)
p(θ) q(θ)
f (θ)
= Eθ∼q
p(θ) q(θ) f (θ)
So instead of computing an expectation wrt p, we compute wrt q. And then we weight each example.
Rejection Sampling
The idea in rejection sampling is similar to importance sampling. Let q be a proposal distribution that satisfies p(x) ≤ M q(x) for M < ∞. Now, draw points from q and accept them with probability p(x)/[M q(x)]. Compute expectations only over the accepted points.