






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to hidden markov models (hmm), a probabilistic model used to analyze sequences of observations, where the underlying states generating the observations are not directly observable. The basics of markov chains, the foundation of hmms, and discusses the problems of probability calculation, state estimation, and model learning. It also includes practical applications and examples of hmms in various fields.
Typology: Study notes
1 / 10
This page cannot be seen from the preview
Don't miss anything!







w(t) = (P(qt=S 1 ), P(qt=S 2 ), … P(qt=SN)
Problem 1: Probability of observations given model: P(O | lambda). Basic idea: we can do this with dynamic programming. This is basically inductive. Suppose we know the probability of producing the first t symbols and winding up in state i at time t, for all values of i. Then we want to use that to compute the same thing for t+1. The key thing is that to figure this out for time t+1 we just need to know if for time t. In particular, it won’t matter what states we were in for time < t, just what states we were in at t. Specifically, (abbreviate \alpha = \a) we define: \a_t(i) = P(O_1, … O_t, q_t = S_i | \lambda)
Problem 2: Maximum likelihood sequence of internal states given model and observations. This is the same as (1), except we use maximum instead of sum, and keep backward pointers.
Problem 3: Estimating a model, given a sequence (or many of them). Formally: argmax_lambda P(O | lambda). Note that we are assuming we know the number of states (more states would always allow a better fit). We approach this with an iterative algorithm. We assume some starting point, then improve it. Given a model, we estimate the sample probability of every parameter. Then we adjust the model to use each sample probability as the true one. For example, we estimate the probability that we begin in state 1, given the model AND the observations. Then we use this as the new prior probability that we will start in state 1. We do this for every aspect of the model. The key to computing these sample probabilities is to figure out the probability that we will be in state i at time t, and the probability that we will move from state i at time t to state j at time t+1. It is important to note that we can’t just use \a to determine this. This is because the probability that we will be in state i at time t doesn’t just depend on the probability of emitting the first t symbols and winding up in state i after time t. It also depends on the chances that we will continue on from state i to emit all the rest of the symbols. For example, it is possible that from state i you almost never go to a state that will emit the next symbol you need.
X1 X2 X
X
This graph represents the relationship that given V2, V1 is conditionally independent of V3 and V4. E.g., P(V1|V2,V3,V4) = P(V1|V2)
q1 q2 q3 q
V1 V2 V3 V
We perform inference with knowledge of conditional probabilities (the model) and of some variables (V1, V2…)
directed edges, and general DAGs.
information from forward and backward directions (or maybe more than two independent directions).