



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Class: ADV LG DATA PROCESSG; Subject: COMPUTER SCIENCE AND INFORMATION SYSTEMS; University: University of Florida; Term: Fall 2007;
Typology: Assignments
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Chapter 4, Problem 3: a) For the classification function, the number of + classifications are 4 and the number of – classifications are
Now if we split on a1 and a2 then, a1 N1 N2 a2 N1 N
result in the this tree. Lets start with finding the second split for the left child (C = true). Also note that the classification error for this node is ½ as computed in part (a). Attribute A: T F Class + 25 0 Class – 0 25 Error 1 – max(25/25, 0/25) 1 – max(0/25, 25/25) = 1 – 1 = 1 – 1 = 0 = 0 Δ = ½ – [25/50 • 0 + 25/50 • 0] = ½ – 0 = 0. Attribute B: T F Class + 5 20 Class – 20 5 Error 1 – max(5/25, 20/25) 1 – max(20/25, 5/25) = 1 – 4/5 = 1 – 4/ = 1/5 = 1/ Δ = ½ – [25/50 • 1/5 + 25/50 • 1/5] = ½ – 1/5 = 3/10 = 0. Clearly, attribute A is better split for the left subtree. Now lets find the second splitting attribute for the right child of the root (C = false). The classification error for this is ½ as computed in part (a). Attribute A: T F Class + 0 25 Class – 0 25 Error 1 – max(25/50, 25/50) = 1 – ½ = ½ Δ = ½ – 50/50 • ½ = ½ – ½ = 0 Attribute B: T F Class + 25 0 Class – 0 25 Error 1 – max(25/25, 0/25) 1 – max(0/25, 25/25) = 1 – 1 = 1 – 1 = 0 = 0 Δ = ½ – [25/50 • 0 + 25/50 • 0] = ½ – 0 = 0. C T F A B T F T F 25 + + 0 + – 25 + + – 0 + 0 – 25 – 0 – 25 – Hence, attribute B is a better split for the right subtree. Since all the leaf nodes are pure classes, 0 out of
the 100 instances will get misclassified. This tree will correctly classify all instances. e) It is clearly evident from the results of part (c) and (d), that the greedy approach does not always lead to a decision tree with lowest misclassification errors. Chapter 4, Problem 9: Tree (a) Number of non leaf nodes = 2 Number of leaf nodes = 3 Number of errors = 7 Number of classes = 3 Number of attributes = 16 Number of records = n Hence, Cost(tree) = 2 log2 16 + 3 log2 3 = 24 + 31.585 = 8 + 4.755 = 12.755 bits Cost(data | tree) = 7 log2 n = 7 log2 n bits Cost(tree, data) = 12.755 + 7 log2 n bits Tree (b) Number of non leaf nodes = 4 Number of leaf nodes = 5 Number of errors = 4 Number of classes = 3 Number of attributes = 16 Number of records = n Hence, Cost(tree) = 4 log2 16 + 5 log2 3 = 44 + 51.585 = 16 + 7.925 = 23.925 bits Cost(data | tree) = 4 log2 n = 4 log2 n bits Cost(tree, data) = 23.925 + 4 log2 n bits Solving for n in 23.925 + 4 log2 n = 12.755 + 7 log2 n , we get, n = 13.208. Hence, according to MDL principle, decision tree (b) is better if n ≥ 14 and tree (a) is better otherwise. Chapter 5, Problem 4: a) Accuracy(R1) = 4 / (4 + 1) = 4/5 = 0. Accuracy(R2) = 30 / (30 + 10) = ¾ = 0. Accuracy(R3) = 100 / (100 + 90) = 10/19 = 0.5263 R1 best, R3 worst b) For FOIL's information gain, we will extend a rule with equal positive and negative coverage with the given rules, and then compare with the results i.e. p0 = n FOIL(R1) = 4 • (log2(4/5) – log2(1/2)) = 4 • (–0.3219 + 1) = 2. FOIL(R2) = 30 • (log2(30/40) – log2(1/2)) = 30 • (–0.4150 + 1) = 17. FOIL(R3) = 100 • (log2(100/190) – log2(1/2)) = 100 • (–0.9260 + 1) = 7.4001 R2 best, R1 worst c) For R1, k = 2, f+ = 4, e+ = 5 • 100/500 = 1, f– = 1, e– = 5 • 400/500 = 4 LSR(R1) = 2 • (4 • log2(4/1) + 1 • log2(1/4)) = 2 • (4 • 2 + 1 • (–2)) = 2 • 6 = 12
c) P(A=1) = 5/10 = 0.5 P(B=1) = 4/10 = 0. P(A=1) • P(B=1) = 0.5 • 0.4 = 0. P(A=1, B=1) = P(B=1 | A=1) • P(A=1) = 2/5 • 5/10 = 2/10 = 0. Since P(A=1, B=1) = P(A=1) • P(B=1), we can say that the random variables A and B are independent. d) P(A=1) = 5/10 = 0.5 P(B=0) = 6/10 = 0. P(A=1) • P(B=0) = 0.5 • 0.6 = 0. P(A=1, B=0) = P(B=0 | A=1) • P(A=1) = 3/5 • 5/10 = 3/10 = 0. Since P(A=1, B=0) = P(A=1) • P(B=0), we can say that the random variables A and B are independent. e) P(A=1, B=1 | +) = P(A=1, B=1, class=+) / P(+) = (1/10) / (5/10) = 1/5 = 0. Using values from part (a), P(A=1 | + ) • P(B=1 | +) = (3/5) • (2/5) = 6/25 = 0. Since P(A=1, B=1 | +) ≠ P(A=1 | + ) • P(B=1 | +), we can say that the random variable A and B are not conditionally independent on the class '+'. Chapter 5, Problem 12: a) P(B=g, F=e, G=e, S=y) = P(S=y | B=g, F=e, G=e) • P(B=g, F=e, G=e) = P(S=y | B=g, F=e) • P(G=e | B=g, F=e) • P(B=g, F=e) S does not depend on G = (1 – P(S=n | B=g, F=e)) • P(G=e | B=g, F=e) • P(B=g) • P(F=e) B and F are independent = (1 – P(S=n | B=g, F=e)) • P(G=e | B=g, F=e) • (1 – P(B=b)) • P(F=e) = (1 – 0.8) • 0.8 • (1 – 0.1) • 0. = 0.2 • 0.8 • 0.9 • 0.2 = 0. b) P(B=b, F=e, G=ne, S=n) = P(S=n | B=b, F=e, G=ne) • P(B=b, F=e, G=ne) = P(S=n | B=b, F=e) • P(G=ne| B=b, F=e) • P(B=b, F=e) S does not depend on G = P(S=n | B=b, F=e) • (1 – P(G=e | B=b, F=e)) • P(B=b) • P(F=e) B and F are independent = 1 • (1 – 0.9) • 0.1 • 0. = 1 • 0.1 • 0.1 • 0.2 = 0. c) P(S=y | B=b) = 1 – P(S=n | B=b) = 1 – P(S=n, B=b) / P(B=b) = 1 – 0.92 = 0. P(S=n, B=b) / P(B=b) = (P(S=n,B=b,F=e) + P(S=n, B=b, F=ne)) / P(B=b) = (P(S=n | B=b, F=e) • P(B=b) • P(F=e) + P(S=n |B=b, F=ne) • P(B=b) • P(F=ne)) / P(B=b) = P(S=n | B=b, F=e) • P(F=e) + P(S=n |B=b, F=ne) • P(F=ne) = 1 • 0.2 + 0.9 • (1 – 0.8) = 0.2 + 0.72 = 0.