Introduction to Probability Theory: A Comprehensive Guide, Lecture notes of Probability and Statistics

A Short Course on Graphical Models, Using Probability Theory to reason under uncertainty

Typology: Lecture notes

2020/2021

Uploaded on 06/11/2021

eshal
eshal 🇺🇸

4.3

(37)

258 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A Short Course on Graphical Models
1. Introduction to Probability Theory
Mark Paskin
mark@paskin.org
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Introduction to Probability Theory: A Comprehensive Guide and more Lecture notes Probability and Statistics in PDF only on Docsity!

A Short Course on Graphical Models

1. Introduction to Probability Theory

Mark Paskin

[email protected]

1

Reasoning under uncertainty

when we have imperfect or incomplete information.In many settings, we must try to understand what is going on in a system

  1. Two reasons why we might reason under uncertainty:

laziness

(modeling every detail of a complex system is costly)

ignorance

(we may not completely understand the system)

Our model will reflect both laziness and ignorance:Example: deploy a network of smoke sensors to detect fires in a building.

We are too

lazy

to model what, besides fire, can trigger the sensors;

We are too

ignorant

to model how fire creates smoke, what density of

smoke is required to trigger the sensors, etc.

2

The only prerequisite: Set Theory

A

B

A

B

A

B

A

B

A

B

A

B

requires Measure Theory.countably infinite sets is not difficult. The extension to uncountably infinite sets For simplicity, we will work (mostly) with finite sets. The extension to

4

Probability spaces

A

probability space

represents our uncertainty regarding an

experiment

  1. theIt has two parts:

sample space

Ω, which is a set of

outcomes

; and

  1. the

probability measure

P

Ω^ , which is a real function of the subsets of Ω.

P

A

P ( A )

A set of outcomes

A

Ω is called an

event

P

A

) represents how likely it is

that the experiment’s

actual

outcome will be a member of

A

5

The three axioms of Probability Theory

P

A

0 for all events

A

P

P

A

B

P

A

P

B

) for disjoint events

A

and

B

A

P ( A ) +

P ( B ) =

P ( AB )

0

1

B

7

Some simple consequences of the axioms

P

A

P

\

A

P

If

A

B

then

P

A

P

B

P

A

B

P

A

P

B

P

A

B

P

A ∪ B ) ≤ P

A

P

B

8

Conditional probability

Conditional probability allows us to reason with

partial information

When

P

B

0, the

conditional probability of

A

given

B

is defined as

P

A

B

P

A

B

P

B

This is the probability that

A

occurs, given we have

observed

B

, i.e., that

we know the experiment’s actual outcome will be in

B

. It is the fraction of

probability mass in

B

that also belongs to

A

P

A

) is called the

a priori (or prior) probability

of

A

and

P

A

B

) is called

the

a posteriori probability

of

A

given

B

Ω

P ( AB ) (^) / P ( B ) =

P ( A | B

)

A

B

10

Example of conditional probability

If

P

is defined by

fire

no fire

smoke

no smoke

then

P

fire

(^) smoke

fire

(^) smoke

no fire

smoke

P

fire

(^) smoke

fire

smoke

no fire

(^) smoke

P

fire

smoke

no fire

(^) smoke

P

fire

(^) smoke

P

fire

(^) smoke

no fire

smoke

11

The chain rule

Apply the product rule repeatedly:

P

i k

A

i )

=

P

A

1 ) P

(^) ( A 2 | A 1 ) P

A

3

| (^) A

1

A

2 ) (^) · · ·

P

A

k

| ∩

k −

1

i

A

i )

independence in Bayesian networks. The chain rule will become important later when we discuss conditional

13

Bayes’ rule

Use the product rule both ways with

P

A

B

) and divide by

P

B

P

A

B

P

B

A

P

A

P

B

For example, if Bayes’ rule translates causal knowledge into diagnostic knowledge.

A

is the event that a patient has a disease, and

B

is the event

that she displays a symptom, then

P

B

A

) describes a causal relationship, and

P

A

B

) describes a diagnostic one (that is usually hard to assess). If

P

B

A

P

A

) and

P

B

) can be assessed easily, then we get

P

A

B

) for free.

14

Examples of random variables

Let’s say our experiment is to draw a card from a deck:

{ A ♥ , 2 ♥

K

A

K

A

K ♣ , A ♠ , 2 ♠

K

random variable

example event

H

ω

) =

true

if

ω

is a

false

otherwise

H

true

N

ω

) =

n

if

ω

is the number

n

otherwise

< N <

F

ω

) =

if

ω

is a face card

otherwise

F

16

Densities

Let

X

Ξ be a finite random variable. The function

p X

is the

density of

X

if for all

x

p X

(^) ( x ) =

P

{ ω : X ( ω

x } )

When Ξ is infinite,

p X

is the

density of

X

if for all

ξ

P

{ ω : X ( ω ) ∈ ξ }

ξ p X

(^) ( x ) d

x

Note that

Ξ

p X

(^) ( x ) d

x

= 1 for a valid density.

Ω

Ξ

ω

X

X ( ω ) =

x

p

X

p X (^) ( x )

17

Random variables and densities

are a layer of abstraction

probability space is implicit. We usually work with a set of random variables and a joint density; the

5

0

5

5

0

50

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.

Ω ω

X

Y

x

y

p XY

(^) ( x , y )

19

Marginal densities

Given the joint density

p XY

x, y

) for

X

Ξ and

Y

Υ, we can

compute the

marginal density

of

X

by

p X

(^) ( x ) =

y ∑ ∈ Υ

p XY

x, y

when Υ is finite, or by

p X

(^) ( x ) =

Υ

p XY

x, y

) d

y

when Υ is infinite.

marginalizationThis process of summing over the unwanted variables is called

20