









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Madam Amrita Ahuja took this quiz in class of Artificial Intelligence at Central University of Jammu and Kashmir. This quiz involves: Decision, Trees, Data, Points, Boundaries, Dimension, Average, Entropy, Fraction, Positive, Nearest, Neighbors
Typology: Exercises
1 / 15
This page cannot be seen from the preview
Don't miss anything!










Problem Score 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts)
Data points are: Negative: (1, 0) (2, 1) (2, 2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above.
enough, leave any parts that you don’t need blank.
Data points are: Negative: (1, 0) (2, 1) (2, 2) Positive: (0, 0) (1, 0)
Positive (+) since this is the class of the closest data point (1,0).
Positive (+) since it is the majority class of the three closest data points (0,0), (1,0) and (2,2).
x 1 1 2
1
2
-2 -
**-
x 2
Data points are: Negative: (1, 0) (2, 2) Positive: (1, 0). Assume that the points are examined in the order given here. Recall that the perceptron algorithm uses the extended form of the data points in which a 1 is added as the 0th component.
Test point misclassified? Updated weights Initial weights 0 0 0 : (1 1 0) yes 1 1 0 : (1 2 2) yes 2 1 2 +: (1 1 0) yes 1 0 2 : (1 1 0) no : (1 2 2) no +: (1 1 0) yes 0 1 2 : (1 1 0) no : (1 2 2) no +: (1 1 0) no
�
(a) Δw 2 =
Solution:
Δw 2 = −η ∂w 2
∂E ∂y ∂z = −η ∂y ∂z ∂w 2
= −η(y − y i)y(1 − y)x 2
= (−1)(0.5 + 0)(0.5)(0.5)(−2)
= 0. 25
Derivations: 1 E = (y − y i)^2 2 y = s(z) 2 z = wixi i= ∂E (^) i = ∂y
y − y
∂y = y(1 − y) ∂z ∂z = x 2 ∂w 2
Consider a Naive Bayes problem with three features, x 1... x 3. Imagine that we have seen a total of 12 training examples, 6 positive (with y = 1) and 6 negative (with y = 0). Here is a table with some of the counts:
y = 0 y = 1 x 1 = 1 6 6 x 2 = 1 0 0 x 3 = 1 2 4
6+2 8
5 | 8
Most learning algorithms we have seen try to find a hypotheses that minimizes error. But how do they attempt to control complexity? Here are some possible approaches:
A: Use a fixedcomplexity hypothesis class
B: Include a complexity penalty in the measure of error
C: Nothing
For each of the following algorithms, specify which approach it uses and say what hy pothesis class it uses (including any restrictions) and what complexity criterion (if any) is included in the measure of error. If the algorithm attempts to optimize the error measure, say whether it is guaranteed to find an optimal solution or just an approximation.
Consider a onedimensional regression problem (predict y as a function of x). For each of the algorithms below, draw the approximate shape of the output of the algorithm, given the data points shown in the graph.
x
y
x
y
x 1 1 2
1
2
-2 -
**-
x 2
Data points are: Negative: (1, 0) (2, 2) Positive: (1, 0)
Recall that for SVMs, the negative class is represented by a desired output of 1 and the positive class by a desired output of 1.
(a) x 1 + x 2 = 0 Goes through the (2,2) point so obviously not maximal margin. (b) x 1 + 1. 5 x 2 = 0 Yes. All three points are support vectors, with margin = 1. (c) x 1 + 2x 2 = 0 No. Three points are needed to define a line, with two support vectors there is no unique maximal margin line. (d) 2 x 1 + 3x 2 = 0 No. The margin for the points is 2, not 1
(a) Polynomial kernel, degree 2 : D (b) Polynomial kernel, degree 3 : B (c) Radial basis kernel, sigma = 0.5 : A (d) Radial basis kernel, sigma = 1.0 : C