Lecture Notes on Statistical Pattern Recognition | CMSC 828 | Study notes Computer Science

Belief Net

Intuitively, one problem with the simple version of relaxation labeling that I’ve sketched

is seen if we have two lines, A and B, that support each other. At one iteration, A

supports B. At the next, B supports A. When this happens, A is getting support for its

being figure that really originates in A itself. It would be better if we could tell the

difference between B seeming to be figure because of A’s support, or support from

another source. Belief nets formalize probabilistic support. Belief propagation is an

algorithm that performs inference according to this probabilistic support.

******** This part omitted in ‘06

First, a belief net is a graphical representation of a set of random variables that makes

explicit some properties about causality and conditional independence. As an example,

suppose we have a house alarm. It can be set off by a burglar or an earthquake. If it goes

off, a neighbor, Sally or John, might call you. The arrows show casuality. Conditional

independence is also indicated; given that the alarm goes off, Sally and John calling are

independent events. They’re not independent if we don’t know whether the alarm went

off. The parent nodes have a different structure; the earthquake and burglary are

independent if we don’t know whether the alarm went off, but are not independent if we

do know. That’s because if the alarm goes off, knowing there was an earthquake effects

my assessment of whether there was a burglary. To make this precise, we have to know

the conditional probability of every variable, based on its causes.

We’re going to talk about one simple algorithm for performing inference in these nets,

that of belief propagation. Inference means that we are given some knowledge about the

value of some variables, and want to determine the value of others. For example, in

vision, we might know that some positions and orientations in the image have a contour

going through them, and want to know whether others do. Though this algorithm is

simple, it’s very elegant and useful.

We now look at how one would really perform correct probabilistic inference in an

image. We first look at the 1D case, which is substantially easier than the 2D case.

Suppose we have an image that we think is piecewise constant, corrupt by noise. Then

the noisefree intensity at each pixel depends on the observed intensity at that pixel, and

also on the rest of the image. We make a Markov assumption that the true intensity of a

pixel only directly depends on its observed intensity, and the true intensity of its

neighbors, and is conditionally independent of everything else. This is what we would

get if we generate an image by deciding independently at each pixel, whether to copy the

previous pixel or jump to a new value, and then corrupting this with independent noise.

We describe this with a chain of variables that represent the true intensities

X1 -> X2 -> … Xn

Lecture Notes on Statistical Pattern Recognition | CMSC 828, Study notes of Computer Science