Probabilistic Programming and Bayesian Networks, Quizzes of Computer Programming

The document introduces probabilistic programming and its application in Bayesian networks. It discusses examples of probabilistic programs such as object tracking, document classification, and social network analysis. The document also explains the concept of generative models and their advantages over discriminative models. It highlights the limitations of Markov models and introduces hidden Markov models. The document concludes with an example of latent Dirichlet allocation for topic modeling.

Typology: Quizzes

2022/2023

Available from 03/29/2023

ClemBSC
ClemBSC 🇺🇸

3.8

(32)

1.6K documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Bayesian networks: probabilistic programming
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Probabilistic Programming and Bayesian Networks and more Quizzes Computer Programming in PDF only on Docsity!

Bayesian networks: probabilistic programming

  • In this module, I will talk about probabilistic programming, a new way to think about defining Bayesian networks through the lens of writing programs, which really highlights the generative process aspect of Bayesian networks.
  • Recall that a Bayesian network is given by (i) a set of random variables, (ii) directed edges between those variables capturing qualitative dependencies, (iii) local conditional distributions of each variable given its parents which captures these dependencies quantitatively, and (iv) a joint distribution which is produced by multiplying all the local conditional distributions together. Now the joint distribution is your probabilistic database, which you can answer all sorts of questions on it using probabilistic inference.
  • There is another way of writing down Bayesian networks other than graphically or mathematically, and that is as a probabilistic program.
  • Let’s go through the alarm example. We can sample B and E independently from a Bernoulli distribution with parameter , which produces 1 (true) with probability . Then we just set A = B ∨ E.
  • In general, a probabilistic program is a randomized program that invokes a random number generator. Executing this program will assign values to a collection of random variables X 1 ,... , Xn; that is, generating an assignment.
  • We then define probability under the joint distribution of an assignment to be exactly the probability that the program generates an assignment.
  • While you can run the probabilistic program to generate samples, it’s important to think about it as a mathematical construct that is used to define a joint distribution.

Probabilistic program: example

Probabilistic program: object tracking

X 0 = (0, 0)

For each time step i = 1,... , n:

if Bernoulli(α):

Xi = Xi− 1 + (1, 0) [go right]

else:

Xi = Xi− 1 + (0, 1) [go down]

(press ctrl-enter to save)

Run

X 1 X 2 X 3 X 4 X 5

CS221 4

Probabilistic inference: example

Question: what are possible trajectories given evidence X 10 = (8, 2)?

(press ctrl-enter to save)

Run

CS221 6

  • Having used the program to define a joint distribution, we can now answer questions about that distribution.
  • For example, suppose that we observe evidence X 10 = (8, 2). What is the distribution over the other variables?
  • In the demo, we condition on the evidence and observe the distribution over all trajectories, which are constrained to go through (8, 2) at time step 10.
  • Now I’m going to quickly go through a set of examples of Bayesian networks or probabilistic programs and talk about the applications they are used for.
  • A natural language sentence can be viewed as a sequence of words, and a language model assigns a probability to each sentence, which measures the ”goodness” of that sentence.
  • Markov models and higher-order Markov models (called n-gram models in NLP), were the dominant paradigm for language modeling before deep learning, and for a while, they outperformed neural language models since they were computationally much easier to scale up.
  • While they could be used to generate text unconditionally, they were often used in the context of a speech recognition or machine translation system to score the fluency of the output.
  • A Markov model generates each word given the previous word according to some local conditional distribution p(Xi | Xi− 1 ) which we’re not specifying right now.

Application: object tracking

Probabilistic program: hidden Markov model (HMM)

For each time step t = 1,... , T :

Generate object location Ht ∼ p(Ht | Ht− 1 )

Generate sensor reading Et ∼ p(Et | Ht)

H 1 H 2 H 3 H 4 H 5

E 1 E 2 E 3 E 4 E 5

(3,1) (3,2)

4 5

Inference: given sensor readings, where is the object?

CS221 10

Application: multiple object tracking

Probabilistic program: factorial HMM

For each time step t = 1,... , T :

For each object o ∈ {a, b}:

Generate location Hto ∼ p(Hto | Hto− 1 )

Generate sensor reading Et ∼ p(Et | H ta , H tb )

H 1 a H 2 a H 3 a H 4 a

H 1 b H 2 b H 3 b H 4 b

E 1 E 2 E 3 E 4

CS221 12

  • An extension of an HMM, called a factorial HMM, can be used to track multiple objects.
  • We assume that each object moves independently according to a Markov model, but that we get one sensor reading which is some noisy aggregated function of the true positions.
  • For example, Et could be the set {H ta , H tb }, which reveals where the objects are, but doesn’t say which object is responsible for which element in the set.
  • Naive Bayes is a very simple model which is often used for classification. For document classification, we generate a label and all the words in the document given that label.
  • Note that the words are all generated independently, which is not a very realistic model of language, but naive Bayes models are surprisingly effective for tasks such as document classification.
  • These types of models are traditionally called generative models as opposed to discriminative models for classification. Rather than thinking about how you take the input and produce the output label (e.g., using a neural network), you go the other way around: think about how the input is generated from the output (which is usually the purer, more structured form of the input).
  • One advantage of using Naive Bayes for classification is that ”training” is extremely easy and fast and just requires counting (as opposed to performing gradient descent).

Application: topic modeling

Probabilistic program: latent Dirichlet allocation

Generate a distribution over topics α ∈ RK

For each position i = 1,... , L:

Generate a topic Zi ∼ p(Zi | α)

Generate a word Wi ∼ p(Wi | Zi)

Z 1 Z 2... ZL

W 1 W 2... WL

{travel:0.8,Europe:0.2}

travel Europe

beach Euro

Inference: given a text document, what topics is it about?

CS221 16

Application: medical diagnosis

Probabilistic program: diseases and symptoms

For each disease i = 1,... , m:

Generate activity of disease Di ∼ p(Di)

For each symptom j = 1,... , n:

Generate activity of symptom Sj ∼ p(Sj | D1:m)

Pneumonia Cold Malaria

Fever Cough Vomit

Inference: If a patient has some symptoms, what diseases do they have?

CS221 18

  • We already saw a special case of this model. In general, we would like to diagnose many diseases and might have measured many symptoms and vitals.