Bayesian Inference and Machine Learning: Computing Expectations with Unknowns - Prof. Haro, Study notes of Computer Science

The problem of computing expectations of functions with respect to a probabilistic model with unknowns in the context of bayesian inference. It covers various methods for approximating these integrals, including summation, uniform sampling, importance sampling, and rejection sampling. The text focuses on the case where the variable of interest, denoted as θ, is univariate and bounded, but also touches upon the challenges of extending these methods to higher dimensions and non-discrete variables.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-n7f
koofers-user-n7f 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Machine Learning (CS 5350/CS 6350) 03 Apr 2007
Bayesian inference
The general problem we face in Bayesian inference is to compute the expectation of some function with
respect to a probabilistic model with unknowns.
In the simplest case, we want the expectation of a single variable. In more complex cases, we want an
expectation of a complex function of all variables.
Suppose p(θ) is our distribution of interest (some θwill be known, some not). We want:
Z=Eθphf(θ)i=Zdθp(θ)f(θ)
For some cases this integral will be available in closed form (eg., HMMs). For many (most?) cases, however,
it will not.
Lets say that θis discrete, univariate. Then we can compute the expectation by just summing over all
possible values. Obviously, though, this won’t scale well to high-dimensional or non-discrete variables. But
let’s see what happens if we try. . .
Integration by Summation
Suppose θis univariate, bounded continuous. Wlog, θ[0,1]. If we remember how we first learned
integration, we can break [0,1] into Requally-sized rectangles. Then, we have:
Z
R1
X
i=0
1
Rp(i/R)f(i/R)
As R ,Zbecomes increasingly more accurate.
One way of thinking about this is that we have a set Scontaining R-many equally spaced points, and the
integral is approximated by:
Z1
|S|X
θS
p(θ)f(θ)
Unfortunately, if θis D-dimensional, then we need to sum RDvalues of θ.
Uniform Sampling
Instead of spacing θSevenly, let’s space them randomly. This is the idea of “Monte Carlo” integration,
which essentially means “randomized” integration. Uniform sampling is the simplest case. Let Sb e a random
sampling of θs. Then, we still have:
Z1
|S|X
θS
p(θ)f(θ)
1
pf2

Partial preview of the text

Download Bayesian Inference and Machine Learning: Computing Expectations with Unknowns - Prof. Haro and more Study notes Computer Science in PDF only on Docsity!

Machine Learning (CS 5350/CS 6350) 03 Apr 2007

Bayesian inference

The general problem we face in Bayesian inference is to compute the expectation of some function with respect to a probabilistic model with unknowns.

In the simplest case, we want the expectation of a single variable. In more complex cases, we want an expectation of a complex function of all variables.

Suppose p(θ) is our distribution of interest (some θ will be known, some not). We want:

Z = Eθ∼p

[

f (θ)

]

dθp(θ)f (θ)

For some cases this integral will be available in closed form (eg., HMMs). For many (most?) cases, however, it will not.

Lets say that θ is discrete, univariate. Then we can compute the expectation by just summing over all possible values. Obviously, though, this won’t scale well to high-dimensional or non-discrete variables. But let’s see what happens if we try...

Integration by Summation

Suppose θ is univariate, bounded continuous. Wlog, θ ∈ [0, 1]. If we remember how we first learned integration, we can break [0, 1] into R equally-sized rectangles. Then, we have:

Z ≈

R∑− 1

i=

R

p (i/R) f (i/R)

As R → ∞, Z becomes increasingly more accurate.

One way of thinking about this is that we have a set S containing R-many equally spaced points, and the integral is approximated by:

Z ≈

|S|

θ∈S

p(θ)f (θ)

Unfortunately, if θ is D-dimensional, then we need to sum RD^ values of θ.

Uniform Sampling

Instead of spacing θ ∈ S evenly, let’s space them randomly. This is the idea of “Monte Carlo” integration, which essentially means “randomized” integration. Uniform sampling is the simplest case. Let S be a random sampling of θs. Then, we still have:

Z ≈

|S|

θ∈S

p(θ)f (θ)

Machine Learning (CS 5350/CS 6350) 2

This scales better computationally, but still the number of samples required to guarantee that we get a close approximation is huge.

It’s worth thinking about how hard this problem is. Think of a boat on a lake. We want to estimate the volume of the lake, but cannot see the bottom. We can drive the boat to any position in the lake and drop an anchor, thereby measuring the depth there. How can we approximate the volume? Uniform sampling says to drive randomly around the lake, dropping at the flip of a coin. But there are many chases in which we can do better.

Importance Sampling

Here we use prior knowledge in the form of a helper distribution q that we expect to be “similar” to p and from which we can sample. It must have the same “support” as p (i.e., not zero too often). Then, we compute:

Z = Eθ∼p[f (θ)]

=

dθp(θ)f (θ)

dθq(θ)

p(θ) q(θ)

f (θ)

= Eθ∼q

[

p(θ) q(θ) f (θ)

]

So instead of computing an expectation wrt p, we compute wrt q. And then we weight each example.

Rejection Sampling

The idea in rejection sampling is similar to importance sampling. Let q be a proposal distribution that satisfies p(x) ≤ M q(x) for M < ∞. Now, draw points from q and accept them with probability p(x)/[M q(x)]. Compute expectations only over the accepted points.