Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Mixture Models-Introduction to Machine Learning-Lecture 16-Computer Science, Lecture notes of Introduction to Machine Learning

Toyota Technological Institute at Chicago (TTIC)Introduction to Machine Learning

Mixture Models, Generative Models, Naive Bayes Classifier, EM Algorithm, Semi-Parametric Models, Parametric Mixtures, Mixture, Likelihood, Mixture Density Estimation, Assignment, Expected Likelihood, Gaussian Mixture, Intro to EM, Greg Shakhnarovich, Lecture Slides, Introduction to Machine Learning, Computer Science, Toyota Technological Institute at Chicago, United States of America.

Typology: Lecture notes

2011/2012

Uploaded on 03/12/2012

alfred67 🇺🇸

4.9

(20)

328 documents

1 / 27

This page cannot be seen from the preview

Don't miss anything!

Lecture 16: Mixture models, EM

TTIC 31020: Introduction to Machine Learning

Instructor: Greg Shakhnarovich

TTI–Chicago

November 1, 2010

Lecture 16: Mixture models, EM TTIC 31020

Discover Lecture notes of Introduction to Machine Learning Toyota Technological Institute at Chicago (TTIC)

Partial preview of the text

Download Mixture Models-Introduction to Machine Learning-Lecture 16-Computer Science and more Lecture notes Introduction to Machine Learning in PDF only on Docsity!

Lecture 16: Mixture models, EM

TTIC 31020: Introduction to Machine Learning

Instructor: Greg Shakhnarovich

TTI–Chicago

November 1, 2010

Review: generative models

General idea: assume (pretend?) p(x | y) comes from a certain parametric class, p(x | y; θy)

Estimate ̂θy from data in each class

Under this estimate, select class with highest p(x 0 | y; ̂θy)

Example: Gaussian model

(^) Can make various assumptions regarding form and complexity of Gaussian covariance!

Plan for today

Semi-parametric models

the EM algorithm

Mixture models

So far, we have assumed that each class has a single coherent model.

−6−8 −6 −4 −2 0 2 4 6 8 10

−

Examples

Images of the same person under different conditions: with/without glasses, different expressions, different views.

Images of the same category but different sorts of objects: chairs with/without armrests.

Multiple topics within the same document.

Different ways of pronouncing the same phonemes.

Mixture models

Assumptions:

(^) k underlying types (components);
(^) yi is the identity of the component “responsible” for xi;
yi is a hidden (latent) variable: never observed.

A mixture model:

−6^ −4 −4 −2 0 2 4 6 8

−

p(x; π) =

∑^ k

p(y = c)p (x | y = c).

πc , p(y = c) are the mixing probabilities

We need to parametrize the component densities p (x | y = c).

Generative model for a mixture

The generative process with k-component mixture:

The parameters θc for each component c are fixed.
(^) Draw yi ∼ [π 1 ,... , πk];
(^) Given yi, draw xi ∼ p (x | yi; θyi ).
The graphical model representation:

y (^) x

π θ

p(x, y; θ, π) = p(y; π)·p(x|y; θy)

Any data point xi could have been generated in k ways.

Gaussian mixture models

If the c-th component is a Gaussian, p (x | y = c) = N (x; μc, Σc), then

p(x; θ, π) =

∑^ k

πc · N (x; μc, Σc) ,

where θ = [μ 1 ,... , μk, Σ 1 ,... , Σk].

The graphical model

y (^) x

π μ^1 ,...,k Σ^1 ,...,k

Mixture density estimation

Suppose that we do observe yi ∈ { 1 ,... , k} for each i = 1,... , N.

Let us introduce a set of binary indicator variables zi = [zi 1 ,... , zik] where

zic = 1 =

1 if yi = c, 0 otherwise.

The count of examples from c-th component:

Nc =

∑^ N

zic.

Mixture density estimation: known labels

If we know zi, the ML estimates of the Gaussian components, just like in class-conditional model, are

−6^ −4 −4 −2 0 2 4 6 8

−

10 y=

π̂ c =

Nc N

μ̂ c =

∑N

zicxi,

Σ̂ c = 1 Nc

∑N

zic(xi − μ̂ c)(xi − ̂μc)T^.

Expected likelihood

The “complete data” likelihood (when z are known):

p(X, Z; π, θ) ∝

∏^ N

∏^ k

(πcN (xi; μc, Σc))zic^.

and the log:

log p(X, Z; π, θ) = const +

∑^ N

∑^ k

zic (log πc + log N (xi; μc, Σc)).

We can’t compute it, but can take the expectation w.r.t. the posterior of z, which is just γic:

Ezic∼γic [log p(xi, zic; π, θ)].

Expected likelihood

log p(X, Z; π, θ) = const +

∑^ N

∑^ k

zic (log πc + log N (xi; μc, Σc)).

Expectation of zic:

Ezic∼γic [zic] =

z∈ 0 , 1

z · γicz = γic.

The expected likelihood of the data:

Ezic∼γic [log p(X, Z; p, θ)] = const

∑^ N

∑^ k

γic (log πc + log N (xi; μc, Σc)).

Summary so far

If we know the parameters and indicators (assignments) we are done.

If we know the indicators but not the parameters, we can do ML estimation of the parameters – and we are done.

If we know the parameters but not the indicators, we can compute the posteriors of indicators;

With known posteriors, we can estimate parameters that maximize the expected likelihood – and then we are done.

But in reality we know neither the parameters nor the indicators.

The EM algorithm

Start with a guess of θ, π.

Typically, random θ and πc = 1/k.

Iterate between: E-step Compute values of expected assignments, i.e. calculate γic, using current estimates of θ, π. M-step Maximize the expected likelihood, under current γic.

Repeat until convergence.

Mixture Models-Introduction to Machine Learning-Lecture 16-Computer Science, Lecture notes of Introduction to Machine Learning

Related documents

Partial preview of the text

Download Mixture Models-Introduction to Machine Learning-Lecture 16-Computer Science and more Lecture notes Introduction to Machine Learning in PDF only on Docsity!

Lecture 16: Mixture models, EM

Review: generative models

Plan for today

Mixture models

Examples

Mixture models

Generative model for a mixture

Gaussian mixture models

Mixture density estimation

∑^ N

Mixture density estimation: known labels

∑N

∑N

Expected likelihood

∏^ N

∑^ N

Expected likelihood

∑^ N

∑^ N

Summary so far

The EM algorithm