

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A problem set for cs446: pattern recognition and machine learning course, which was handed out in fall 2008. It includes two main problems: the first one is about tree dependent distributions and showing that the choice of a root node for directing the tree does not matter. The second problem is about deriving an expectation-maximization (em) algorithm to estimate unknown parameters in a given distribution. The problem set is due on december 5, 2008.
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


CS446: Pattern Recognition and Machine Learning Fall 2008
Handed Out: November 20, 2008 Due: December 5, 2008
(a) State exactly what is meant by the statement “the two directed trees obtained from T are the same”. (b) Show that no matter which node in T is chosen as a root for the “direction” stage, the resulting directed trees are all the same (based on your definition above).
(a) Express P (x(j)) first in terms of conditional probabilities and then in terms of the unknown parameters p, αi, βi.
(b) Let q( yj )= P (Y = y|x(j)), i.e., the probability that the data point x(j)^ has y as
the value of its hidden variable Y. Express q( 1 j )and q( 2 j )in terms of the unknown parameters. (c) Derive an expression for the expected log likelihood (LL) of the entire data set x(1), x(2),... , x(m)^ and its associated y settings given new parameter estimates p,˜ α˜i, β˜i.
(d) Maximize the LL and determine the update rules for the parameters according to the EM algorithm. (e) Examine the update rules and try to explain them in English. Describe in English how you would run the algorithm: initialization, iteration, termination. What equations would you use at which steps in the algorithm? (f) Assume that your task is to predict the value of Y given an assignment to all n variables and that you have the parameters of the model. Show how to use these parameters to predict Y.
(g) Show that the decision surface for this prediction is a linear function of the xi’s.