

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Iterations and calculations related to clustering and association rule learning in data mining. It includes the process of breaking ties in favor of the cluster with the smallest number and the calculation of entropy, purity, precision, and recall for different clusters.
Typology: Exams
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Problem 1: a) Iteration 1: M1=5 C1={1, 2, 3, 4, 5, 6, 10} M2=20 C2={20, 30, 40, 50, 60} Iteration 2: M1=4.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=40 C2={30, 40, 50, 60} Iteration 3: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} Iteration 4: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} b) Iteration 1: M1=2 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=50 C2={30, 40, 50, 60} Iteration 2: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} Iteration 3: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} c) Iteration 1: M1=6 C1={1, 2, 3, 4, 5, 6} M2=10 C2={10, 20, 30, 40, 50, 60} Iteration 2: M1=3.5 C1={1, 2, 3, 4, 5, 6, 10} M2=35 C2={20, 30, 40, 50, 60} Iteration 3: M1=4.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=40 C2={30, 40, 50, 60} Iteration 4: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} Iteration 5: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} d) We always break ties in the favor of cluster with the smallest number. (1) (2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) ((1, 2), 3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (((1, 2), 3), 4) (5) (6) (10) (20) (30) (40) (50) (60) ((((1, 2), 3), 4), 5) (6) (10) (20) (30) (40) (50) (60) (((((1, 2), 3), 4), 5), 6) (10) (20) (30) (40) (50) (60) ((((((1, 2), 3), 4), 5), 6), 10) (20) (30) (40) (50) (60) (((((((1, 2), 3), 4), 5), 6), 10), 20) (30) (40) (50) (60) ((((((((1, 2), 3), 4), 5), 6), 10), 20), 30) (40) (50) (60) (((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40) (50) (60) ((((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40), 50) (60) (((((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40), 50), 60) e) We always break ties in the favor of cluster with the smallest number. (1) (2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3, 4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3, 4) (5, 6) (10) (20) (30) (40) (50) (60) ((1, 2), (3, 4)) (5, 6) (10) (20) (30) (40) (50) (60) (((1, 2), (3, 4)), (5, 6)) (10) (20) (30) (40) (50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20) (30) (40) (50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) (40) (50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) (40, 50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) ((40, 50), 60) (((((1, 2), (3, 4)), (5, 6)), 10), (20, 30)) ((40, 50), 60) ((((((1, 2), (3, 4)), (5, 6)), 10), (20, 30)), ((40, 50), 60))
Problem 2: a) Entropy Purity Cluster #1 2.0558 22/49 = 0. Cluster #2 2.0396 29/71 = 0. Cluster #3 1.4549 45/66 = 0. Cluster #4 1.4885 8/13 = 0. Overall 1.8137 0. b) Precision(“Compilers”) Cluster #1 2/49 = 0. Cluster #2 3/71 = 0. Cluster #3 1/66 = 0. Cluster #4 8/13 = 0. Overall 0. c) Recall(“Systems”) Cluster #1 7/45 = 0. Cluster #2 23/45 = 0. Cluster #3 12/45 = 0. Cluster #4 3/45=0. Overall 0.3134 (weighted sum) OR 45/45= Problem 3: a) Same as show in Figure 6.32 on page 408 in the text book. Note that the authors have not used the label L10 for any leaf node. b) Leaf node L c) Leaf nodes visited will be L4, L2, L3, L5, L1, L8, L6 and L d) Candidate item sets will be {1, 2, 7} {1, 7, 8} and {2, 7, 8} Problem 4: a) b) Both are false. Consider the following counter example. Let minsup = 0.2 and minconf = 0. Support(A, B, C) = 4/20 Support(A) = 10/20 Confidence(A → B) = 5/10 >= minconf Support(A, B) = 5/20 Confidence(A → BC) = 4/10 < minconf Support(A, C) = 9/20 Confidence(AC → B) = 4/9 < minconf c) False. Consider the following counter example Let minsup = 0.2 and minconf = 0. Support(A, B, C) = 4/20 Support(A) = 10/20 Confidence(AC → B) = 4/5 >= minconf Support(A, B) = 9/20 Support(A, C) = 5/20 Confidence(AB → C) = 4/9 < minconf d) False. Consider the following counter example Let minsup = 0.2 and minconf = 0. Support(A) = 10/20 Support(A, B) = 5/20 Confidence(A → B) = 5/10 >= minconf Support(B) = 12/20 Confidence(B → A) = 5/12 < minconf