









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A midterm exam in Machine Learning from Carnegie Mellon University. The exam consists of 5 questions with a total score of 100. The topics covered include Short questions, MLE/MAP, Bayes Nets, EM, and Regression. The exam is open book and open notes, and no computers or internet access is allowed. The exam duration is 80 minutes. The document also includes personal information and instructions for the exam.
Typology: Exams
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Question Topic Max. score Score 1 Short questions 35 2 MLE/MAP 15 3 Bayes Nets 15 4 EM 15 5 Regression 20 Total 100
Answer True/False in the following 8 questions. Explain your reasoning in 1 sentence.
F SOLUTION: FALSE. Decision trees only provide a label estimate, whereas logistic regression provides the probability of a label (patient has cancer) for a given input (cellular image).
F SOLUTION: FALSE. This is not a good accuracy on this dataset, since a classifier that outputs ”cancer-free” for all input images will have better accuracy (90%).
F SOLUTION: FALSE. The second classifier has better test accuracy which reflects the true accuracy, whereas the first classifier is overfitting.
F SOLUTION: TRUE. Knowledge of nA value tells us something about nB therefore P (nA|nB ) 6 = P (nA) hence they are marginally dependent, but given n, nA and nB are determined independently. Also follows from following Bayes Net:
F SOLUTION: TRUE. Since C is a Boolean random variable, we have
P (A|B) = P (A, C = 0|B) + P (A, C = 1|B) = P (A|B, C = 0)P (C = 0) + P (A|B, C = 1)P (C = 1)
where last step follows from definition of conditional probability.
The following three short questions are not True/False questions. Please provide explanations for your answers.
F SOLUTION: Using factorization of joint distribution P (A, B, C) = P (C)P (A|C)P (B|C)
and using definition of conditional independence P (A, B, C) = P (C)P (A, B|C)
Therefore, we have: P (A, B|C) = P (A|C)P (B|C) i.e. A is conditionally independent of B given C (A ⊥⊥ B|C).
(a) Decision Tree (b) Logistic Regression (c) Gaussian Naive Bayes
F SOLUTION: Decision Tree only. Decision trees of depth 2 which first splits on X 1 and then on X 2 wil perfectly classify it. Logistic regression leads to linear decision boundaries, hence cannot classify this data perfectly. Due to conditional independence requirement, it is not possible to fit a Gaussian that peaks at the labels of only one class and has no covariance between features, so Gaussian Naive Bayes cannot classify this data perfectly.
In this question you will estimate the probability of a coin landing heads using MLE and MAP estimates.
Suppose you have a coin whose probability of landing heads is p = 0.5, that is, it is a fair coin. However, you do not know p and would like to form an estimator θˆ for the probability of landing heads p. In class, we derived an estimator that assumed p can take on any value in the interval [0, 1]. In this question, you will derive an estimator that assumes p can take on only two possible values: 0.3 or 0. 6.
Note: Pθˆ[heads] = θˆ.
Hint: All the calculations involved here are simple. You do not require a calculator.
θ^ ˆ = argmaxθ∈{ 0. 3 , 0. 6 } Pθ[D] = argmaxθ∈{ 0. 3 , 0. 6 } Pθ[heads]Pθ[tails]^2 = argmaxθ∈{ 0. 3 , 0. 6 } θ(1 − θ)^2 We observe that Pθ=0. 3 [D] Pθ=0. 6 D]
which implies that θˆ = 0.3.
P [p = 0.3] = 0. 3 and P [p = 0.6] = 0. 7.
Again, you flip the coin 3 times and note that it landed 2 times on tails and 1 time on heads. Find the MAP estimate θˆ of p over the set { 0. 3 , 0. 6 }, using this prior. Solution:
θ^ ˆ = argmaxθ∈{ 0. 3 , 0. 6 } Pθ[D]P [θ] We observe that
Pθ=0. 3 [D]P [θ = 0.3] Pθ=0. 6 [D]P [θ = 0.6]
which implies that θˆMAP = 0.6.
(d) B is independent of C given only A (e) B is not independent of C given A and D
Solution: Any of the following satisfy the above:
Solution: See below for the number of parameters needed for each node. To- tal is 17. (b) Please give the minimum number of Bayes net parameters required to fully spec- ify the distribution P (G|A, B, C, D, E, F ). Briefly justify your answer.
Solution: Note that the Markov blanket for G consists only of F. Thus, P (G|A, B, C, D, E, F ) = P (G|F ) and only two parameters are need to specify this distribution.
(a) E is conditionally independent of G given F. Solution: True. (b) A is conditionally independent of C given B and G. Solution: False.
In this question, we will explore bias and variance in linear regression. Assume that a total of N data points of the form (xi, yi) are generated from the following (true) model:
xi ∼ U nif (0, 1), yi = f (xi) + i, i ∼ N (0, 1), f (x) = x
We assume xi ⊥ j ∀i, j and i ⊥ j ∀i 6 = j (note a ⊥ b means a and b are independent).
You may find the following pieces of information useful when solving this problem:
x
(ED[hD(x)] − f (x))^2 p(x)dx
x
ED[(hD(x) − ED[hD(x)])^2 ]p(x)dx
0 p(x)dx^ = 1, and therefore^ p(x) = 1.
We begin by examining the case where we are not aware that y depends on x. Instead, our (incorrect) model is that f (x) has some constant value f (x) = μ, and therefore
xi ∼ U nif (0, 1), yi ∼ N (μ, 1) with xi ⊥ yi.
We use the MLE estimator for μ. That is, we let ˆμ = (^) N^1
i=
yi. The prediction of our trivial
regression model for the value of yi is ˆμ, regardless of the value of xi.
F SOLUTION: ED[hD(x)] = ED[ˆμ] = (^12)
F SOLUTION: Bias^2 =
0
− x)^2 (1) dx = −
− x)^3 |^10 =
The bias is thus
1
[2 pts] What is the variance of this trivial regression model?
F SOLUTION: The variance is the variance of the MLE estimator. By the third bullet, this is (^) N^1.
F SOLUTION: The unavoidable error is introduced by i, and is 1 by assumption.
F SOLUTION: The unavoidable error and bias do not change. The variance goes to 0 as N → ∞.
F SOLUTION: The unavoidable error is still introduced by i, and is 1 by assumption.
F SOLUTION: Model 1 (the trivial model) is the horizontal line. Model 2 (the linear regression model) is the diagonal line.
0.0 0.2 0.4 0.6 0.8 1.
0.^ 0.^ 0.^ 0.^ 0.^
x
y