Download Statistical Machine Learning: Mixture of Gaussians, EM Algorithm, and HMM - Prof. Yuan Qi and more Study notes Computer Science in PDF only on Docsity! CS 59000 Statistical Machine learning
Lecture 24
Outline • Review of K‐medoids, Mixture of Gaussians, Expectation Maximization (EM), Alternative view of EM • Hidden Markvo Models, forward‐backward algorithm, EM for learning HMM parameters, Viterbi Algorithm, Linear state space models, Kalman filtering and smoothing Conditional Probability
V(er) = per = 1)x) =
Responsibility that component k takes for
explaining the observation.
Maximum Likelihood
Maximize the log likelihood function
N K
Inp(X|mr, 5) = $7 In 1y TN (Xn | Mes =}
n=1 k=1
Graphical representation of a Gaussian mixture model
for a set of N i.i.d. data points {x,.}, with corresponding
latent points {z,,}, where n =1,...,N.
Severe Overfitting by Maximum Likelihood
4
p(@)
>
x
When a cluster has only data point, its variance
goes to 0.
Maximum Likelihood Conditions (3)
Lagrange function:
K
In p(X|a, pu, ) + A (> Th — ‘
k=1
Setting its derivative to zero and use the
normalization constraint, we obtain:
M,
N
Tk =
Expectation Maximization for Mixture Gaussians
Although the previous conditions do not
provide closed-form conditions, we can use
them to construct iterative updates:
E step: Compute responsibilities 1(2n«) .
M step: Compute new mean Hx, variance »;,,
and mixing coefficients 7:.
Loop over E and M steps until the log
likelihood inp(X|u,5,r) stops to increase.
General EM Algorithm
Given a joint distribution p(X, Z|@) over observed variables X and latent vari-
ables Z, governed by parameters @, the goal is to maximize the likelihood func-
tion p(X|@) with respect to 0.
|. Choose an initial setting for the parameters 0°".
2. Estep Evaluate p(Z|X, 0°").
3. Mstep Evaluate @"™ given by
ouew = arg max Qa, g°')
6
where
Q(0, 0°") = S- p(Z|X, A") In p(X, Z|6).
7
4. Check for convergence of either the log likelihood or the parameter values.
If the convergence criterion is not satisfied, then let
g°!! grew
and return to step 2.
Lower Bound Perspective of EM
* Expectation Step:
Maximizing the functional lower bound £(q, @) over
the distribution 9(Z) .
* Maximization Step:
Maximizing the lower bound over the parameters 6.
Illustration of EM Updates
In p(X]@)
0 old 0 new
Sequential Data
10000
8000
6000
Frequency (Hz)
4000
2000
03
0.15
©
3
s
2 0 ~
a
E
<
0.15
0.3
0 02 04 0.6 0.8 1
Time (sec)
| ey | Zz | tn| in| er | em |
| Bayes’ | Theorem |
There are temporal dependence between data points
State Space Models
ie bee
N
Xy,---,KN.Z1,---5 |The 0 1 [TL (Xn |Zn)
n=2 1=1
Important graphical models for many dynamic
models, includes Hidden Markov Models (HMMs)
and linear dynamic systems
Questions: order for the Markov assumption
Hidden Markov Models
Many applications, e.g., speech recognition,
natural language processing, handwriting
recognition, bio-sequence analysis
From Mixture Models to HMMs
By turning a mixture Model into a dynamic model,
we obtain the HMM.
Let model the dependence between two consecutive
latent variables by a transition probability:
KK
P(Zn|Zn—1.A) = Il I] Aye
k=1j=1
Inference: Forward-backward Algorithm
Goal: compute marginals for latent variables.
Forward-backward Algorithm: exact inference
as special case of sum-product algorithm on
the HMM.
Factor graph representation (grouping
emission density and transition probability
in one factor at a time):
a CO)...
Forward-backward Algorithm as Message Passing Method (1)
bree Ou
1 Zant Zn
h(@1) = p(%)p(x1|zZ1)
tn(Zn—152n) = P(Zn|Zn—1)P(Xn|Zn)
Forward messages:
Ha 1+ fn (Zn—1) = [fy 12-1 (Zn—1)
Mf, —2,(Zn) = Ss” fr (Zn—1+ Zn) bay, 1 fy (Zn—1)
Zn—1
Uf, ay (Bn) = S > finn 15 Fn Mf) 24 (Zn)
Zn—1
Q(Zn) = Uf, an (Zn)
Forward-backward Algorithm as Message Passing Method (2)
h fn
i ree ()\—4 /—_—_—_—_
a Mol ) a
Backward messages (Q: how to compute it?):
B(Zn) = MS, 41-2, (Zn)
The messages actually involves X
P(Zn.X&) = Lf, 2, (Zn) Mfr gan (Zn) = (Zn) F(Zn)
p(X|Zn)p(Zn) _ O (Zn) 3(Zn)
p(X) p(X)
(Zn) = P(Zn|X) =
Similarly, we can compute the following (Q: why)
( \ ( (Zn—1) Pn |Bn / n|Zn—1) Ben
E(Zn—1; Zn) — P(Zn—1: Zn |X) = O(2n—)PGe as [en 1) 5 (4m)
Maximum Likelihood Estimation for HMM
Goal: maximize
p(X|0) = > _p(X, Z|)
Z
Looks familiar? Remember EM for mixture of
Gaussians... Indeed the updates are similar.
EM for HMM
E step:
V2n) = Pl2n|/X,0")
E(Zn—1, Zn) = D(Zn—1; Zn|X, a")
Computed from forward-backward/sum-product
algorithm
M step: »
— Vin) So nk) Xn
K n=1
Meo =
ao Sten
n=1
€(Zn-1,9:2n us 1
A, = » ici Ye n — Hr) )(Xn — Hr)!
“de "KOON _ nel
eee) OR
l=1 n=2 So (Ent)
n=1
Linear Dynamical Systems
P(Zn|Zn—1) = N(4Zn|AZn—1,T)
P(Xn|Zn) = N(Xn|Czn, Z)-
p(Z1) = N(z1|Mo, Vo)
Equivalently, we have
Zn = AZn—1+Wh
Xn = Cz, +Vn
Z = potu
where
w ~ N(w0,T)
v ~ N(v|0,E)
u ~ N(ul0, Vo).
Extension of HMM and LDS
(2) (2) (2)
Zy-1 ai ‘ Zn41
>
(1) (1) (1)
n-l Zs ; Zn4l
>
Xr—1 Xn, Xn+1
Discrete latent variables: Factorized HMMs
Continuous latent variables: switching Kalman filtering models