Bayesian Learning Cont'd - Lecture Slides | CS 591, Assignments of Programming Languages

Material Type: Assignment; Class: ST: Prog Analy &Mechanization; Subject: Computer Science; University: University of New Mexico; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 07/23/2009

koofers-user-ei5-1
koofers-user-ei5-1 🇺🇸

8 documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Bayesian Learning,
Cont’d
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Bayesian Learning Cont'd - Lecture Slides | CS 591 and more Assignments Programming Languages in PDF only on Docsity!

Bayesian Learning,

Cont’d

Administrivia

Various homework bugs:

Due: Oct 12 (Tues) not 9 (Sat)

Problem 3 should read:

(duh)

(some) info on naive Bayes in Sec. 4.3 of text

f (X, C) = f (C)

d

i=

f (x

i

|C)

= f (C)

d

i=

σ i,c

e

− (x i −μ i,c ) 2 2 σ i,c

5 minutes of math...

Joint probabilities

Given d different random vars,

The “joint” probability of them taking on the simultaneous values

given by

Or, for shorthand,

Closely related to the “joint PDF”

X 1 , X 2 ,... , Xd

X 1 = v 1 , X 2 = v 2 ,... , Xd = vd

Pr[X 1 = v 1 , X 2 = v 2 ,... , Xd = vd ]

Pr[v 1 , v 2 ,... , vd ]

f (X 1 ,... , Xd )

5 minutes of math...

Independence:

Two random variables are statistically independent iff:

Or, equivalently (usually for discrete RVs):

For multivariate RVs:

f (X 1 , X 2 ) = f (X 1 )f (X 2 )

Pr[X 1 , X 2 ] = Pr[X 1 ] Pr[X 2 ]

f (X 1 , X 2 , · · · , X d ) = f (X 1 )f (X 2 ) · · · f (X d )

d

i=

f (X

i

Parameterizing PDFs

Given training data, [ X , Y ], w/ discrete labels Y

Break data out into sets , etc.

Want to come up with models, ,

Suppose the individual f() s are Gaussian, need the params μ and σ

How do you get the params?

Now, what if the f()s are something really funky you’ve never seen before in your life, with parameters

[X Y =a , a], [X Y =b , b]

f (X Y =a |Y = a)

f (X Y =b |Y = b) , etc.

Θ = [θ 1 , θ 2 ,... , θ 193 ]

Maximum likelihood

Principle of maximum likelihood:

Pick the parameters that make the data as probable (or, in general “likely”) as possible

Regard the probability function as a func of two variables: data and parameters:

Function L is the “likelihood function”

Want to pick the that maximizes L

L(Θ, X) ≡ f (X) under params Θ

Exponential as fn of x

0 2 4 6 8 10 0

x f(x) L(X,!) regarded as a fn of X

Exponential as a fn of τ

0 2 4 6 8 10 0

L(X,!) regarded as a fn of! ! L( ! )

IID Samples

In supervised learning, we usually assume that data points are sampled independently and from the same distribution

IID assumption: data are independent and identically distributed

IID Samples

In supervised learning, we usually assume that data points are sampled independently and from the same distribution

IID assumption: data are independent and identically distributed

joint PDF can be written as product of individual (marginal) PDFs:

f (X) = f (X 1 )f (X 2 ) · · · f (X N )

N

i=

f (X i )

Exercise

Find the maximum likelihood estimator of μ for the univariate Gaussian:

Find the maximum likelihood estimator of β for the degenerate gamma distribution:

Hint: consider the log of the likelihood fns in both cases

f (x) =

e

− (x−μ) 2 2 σ

f (x) =

3

x

2

e

− x β

Putting the parts together

[ X , Y ]

complete training data

[X Y =a , a]

[X Y =b , b]

[X Y =c , c]

5 minutes of math...

Conditional probabilities

Suppose you have a joint PDF, f ( H , W )

Now you get to see one of the values, e.g., H=“ 183cm

What’s your probability estimate of A , given this new knowledge?

5 minutes of math...

Conditional probabilities

Suppose you have a joint PDF, f ( H , W )

Now you get to see one of the values, e.g., H=“ 183cm

What’s your probability estimate of A , given this new knowledge?

f (W |H) =

f (H, W )

f (H)

f (H, W )

w

f (H, W )dw