Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Machine Learning: Mixture of Gaussians, EM Algorithm, and HMM - Prof. Yuan Qi, Study notes of Computer Science

This lecture covers k-medoids, mixture of gaussians, expectation maximization (em) algorithm, hidden markov models (hmm), and kalman filtering. It explains the k-medoids algorithm, mixture of gaussians, and the em algorithm for learning the parameters of a gaussian mixture model. It also discusses hmms, the forward-backward algorithm, the viterbi algorithm, and kalman filtering for inference on linear gaussian systems.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-q7w
koofers-user-q7w 🇺🇸

10 documents

1 / 36

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Machine Learning: Mixture of Gaussians, EM Algorithm, and HMM - Prof. Yuan Qi and more Study notes Computer Science in PDF only on Docsity! CS 59000 Statistical Machine learning Lecture 24 Outline • Review of K‐medoids, Mixture of Gaussians,  Expectation Maximization (EM), Alternative  view of EM • Hidden Markvo Models, forward‐backward  algorithm, EM for learning HMM  parameters, Viterbi Algorithm, Linear state  space models, Kalman filtering and  smoothing Conditional Probability V(er) = per = 1)x) = Responsibility that component k takes for explaining the observation. Maximum Likelihood Maximize the log likelihood function N K Inp(X|mr, 5) = $7 In 1y TN (Xn | Mes =} n=1 k=1 Graphical representation of a Gaussian mixture model for a set of N i.i.d. data points {x,.}, with corresponding latent points {z,,}, where n =1,...,N. Severe Overfitting by Maximum Likelihood 4 p(@) > x When a cluster has only data point, its variance goes to 0. Maximum Likelihood Conditions (3) Lagrange function: K In p(X|a, pu, ) + A (> Th — ‘ k=1 Setting its derivative to zero and use the normalization constraint, we obtain: M, N Tk = Expectation Maximization for Mixture Gaussians Although the previous conditions do not provide closed-form conditions, we can use them to construct iterative updates: E step: Compute responsibilities 1(2n«) . M step: Compute new mean Hx, variance »;,, and mixing coefficients 7:. Loop over E and M steps until the log likelihood inp(X|u,5,r) stops to increase. General EM Algorithm Given a joint distribution p(X, Z|@) over observed variables X and latent vari- ables Z, governed by parameters @, the goal is to maximize the likelihood func- tion p(X|@) with respect to 0. |. Choose an initial setting for the parameters 0°". 2. Estep Evaluate p(Z|X, 0°"). 3. Mstep Evaluate @"™ given by ouew = arg max Qa, g°') 6 where Q(0, 0°") = S- p(Z|X, A") In p(X, Z|6). 7 4. Check for convergence of either the log likelihood or the parameter values. If the convergence criterion is not satisfied, then let g°!! grew and return to step 2. Lower Bound Perspective of EM * Expectation Step: Maximizing the functional lower bound £(q, @) over the distribution 9(Z) . * Maximization Step: Maximizing the lower bound over the parameters 6. Illustration of EM Updates In p(X]@) 0 old 0 new Sequential Data 10000 8000 6000 Frequency (Hz) 4000 2000 03 0.15 © 3 s 2 0 ~ a E < 0.15 0.3 0 02 04 0.6 0.8 1 Time (sec) | ey | Zz | tn| in| er | em | | Bayes’ | Theorem | There are temporal dependence between data points State Space Models ie bee N Xy,---,KN.Z1,---5 |The 0 1 [TL (Xn |Zn) n=2 1=1 Important graphical models for many dynamic models, includes Hidden Markov Models (HMMs) and linear dynamic systems Questions: order for the Markov assumption Hidden Markov Models Many applications, e.g., speech recognition, natural language processing, handwriting recognition, bio-sequence analysis From Mixture Models to HMMs By turning a mixture Model into a dynamic model, we obtain the HMM. Let model the dependence between two consecutive latent variables by a transition probability: KK P(Zn|Zn—1.A) = Il I] Aye k=1j=1 Inference: Forward-backward Algorithm Goal: compute marginals for latent variables. Forward-backward Algorithm: exact inference as special case of sum-product algorithm on the HMM. Factor graph representation (grouping emission density and transition probability in one factor at a time): a CO)... Forward-backward Algorithm as Message Passing Method (1) bree Ou 1 Zant Zn h(@1) = p(%)p(x1|zZ1) tn(Zn—152n) = P(Zn|Zn—1)P(Xn|Zn) Forward messages: Ha 1+ fn (Zn—1) = [fy 12-1 (Zn—1) Mf, —2,(Zn) = Ss” fr (Zn—1+ Zn) bay, 1 fy (Zn—1) Zn—1 Uf, ay (Bn) = S > finn 15 Fn Mf) 24 (Zn) Zn—1 Q(Zn) = Uf, an (Zn) Forward-backward Algorithm as Message Passing Method (2) h fn i ree ()\—4 /—_—_—_—_ a Mol ) a Backward messages (Q: how to compute it?): B(Zn) = MS, 41-2, (Zn) The messages actually involves X P(Zn.X&) = Lf, 2, (Zn) Mfr gan (Zn) = (Zn) F(Zn) p(X|Zn)p(Zn) _ O (Zn) 3(Zn) p(X) p(X) (Zn) = P(Zn|X) = Similarly, we can compute the following (Q: why) ( \ ( (Zn—1) Pn |Bn / n|Zn—1) Ben E(Zn—1; Zn) — P(Zn—1: Zn |X) = O(2n—)PGe as [en 1) 5 (4m) Maximum Likelihood Estimation for HMM Goal: maximize p(X|0) = > _p(X, Z|) Z Looks familiar? Remember EM for mixture of Gaussians... Indeed the updates are similar. EM for HMM E step: V2n) = Pl2n|/X,0") E(Zn—1, Zn) = D(Zn—1; Zn|X, a") Computed from forward-backward/sum-product algorithm M step: » — Vin) So nk) Xn K n=1 Meo = ao Sten n=1 €(Zn-1,9:2n us 1 A, = » ici Ye n — Hr) )(Xn — Hr)! “de "KOON _ nel eee) OR l=1 n=2 So (Ent) n=1 Linear Dynamical Systems P(Zn|Zn—1) = N(4Zn|AZn—1,T) P(Xn|Zn) = N(Xn|Czn, Z)- p(Z1) = N(z1|Mo, Vo) Equivalently, we have Zn = AZn—1+Wh Xn = Cz, +Vn Z = potu where w ~ N(w0,T) v ~ N(v|0,E) u ~ N(ul0, Vo). Extension of HMM and LDS (2) (2) (2) Zy-1 ai ‘ Zn41 > (1) (1) (1) n-l Zs ; Zn4l > Xr—1 Xn, Xn+1 Discrete latent variables: Factorized HMMs Continuous latent variables: switching Kalman filtering models