Data Mining: Clustering and Association Rule Learning, Exams of Computer Science

Iterations and calculations related to clustering and association rule learning in data mining. It includes the process of breaking ties in favor of the cluster with the smallest number and the calculation of entropy, purity, precision, and recall for different clusters.

Typology: Exams

Pre 2010

Uploaded on 03/18/2009

koofers-user-thc
koofers-user-thc 🇺🇸

8 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Problem 1:
a)
Iteration 1: M1=5 C1={1, 2, 3, 4, 5, 6, 10} M2=20 C2={20, 30, 40, 50, 60}
Iteration 2: M1=4.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=40 C2={30, 40, 50, 60}
Iteration 3: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60}
Iteration 4: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60}
b)
Iteration 1: M1=2 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=50 C2={30, 40, 50, 60}
Iteration 2: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60}
Iteration 3: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60}
c)
Iteration 1: M1=6 C1={1, 2, 3, 4, 5, 6} M2=10 C2={10, 20, 30, 40, 50, 60}
Iteration 2: M1=3.5 C1={1, 2, 3, 4, 5, 6, 10} M2=35 C2={20, 30, 40, 50, 60}
Iteration 3: M1=4.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=40 C2={30, 40, 50, 60}
Iteration 4: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60}
Iteration 5: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60}
d)
We always break ties in the favor of cluster with the smallest number.
(1) (2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60)
(1, 2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60)
((1, 2), 3) (4) (5) (6) (10) (20) (30) (40) (50) (60)
(((1, 2), 3), 4) (5) (6) (10) (20) (30) (40) (50) (60)
((((1, 2), 3), 4), 5) (6) (10) (20) (30) (40) (50) (60)
(((((1, 2), 3), 4), 5), 6) (10) (20) (30) (40) (50) (60)
((((((1, 2), 3), 4), 5), 6), 10) (20) (30) (40) (50) (60)
(((((((1, 2), 3), 4), 5), 6), 10), 20) (30) (40) (50) (60)
((((((((1, 2), 3), 4), 5), 6), 10), 20), 30) (40) (50) (60)
(((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40) (50) (60)
((((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40), 50) (60)
(((((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40), 50), 60)
e)
We always break ties in the favor of cluster with the smallest number.
(1) (2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60)
(1, 2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60)
(1, 2) (3, 4) (5) (6) (10) (20) (30) (40) (50) (60)
(1, 2) (3, 4) (5, 6) (10) (20) (30) (40) (50) (60)
((1, 2), (3, 4)) (5, 6) (10) (20) (30) (40) (50) (60)
(((1, 2), (3, 4)), (5, 6)) (10) (20) (30) (40) (50) (60)
((((1, 2), (3, 4)), (5, 6)), 10) (20) (30) (40) (50) (60)
((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) (40) (50) (60)
((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) (40, 50) (60)
((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) ((40, 50), 60)
(((((1, 2), (3, 4)), (5, 6)), 10), (20, 30)) ((40, 50), 60)
((((((1, 2), (3, 4)), (5, 6)), 10), (20, 30)), ((40, 50), 60))
pf2

Partial preview of the text

Download Data Mining: Clustering and Association Rule Learning and more Exams Computer Science in PDF only on Docsity!

Problem 1: a) Iteration 1: M1=5 C1={1, 2, 3, 4, 5, 6, 10} M2=20 C2={20, 30, 40, 50, 60} Iteration 2: M1=4.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=40 C2={30, 40, 50, 60} Iteration 3: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} Iteration 4: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} b) Iteration 1: M1=2 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=50 C2={30, 40, 50, 60} Iteration 2: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} Iteration 3: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} c) Iteration 1: M1=6 C1={1, 2, 3, 4, 5, 6} M2=10 C2={10, 20, 30, 40, 50, 60} Iteration 2: M1=3.5 C1={1, 2, 3, 4, 5, 6, 10} M2=35 C2={20, 30, 40, 50, 60} Iteration 3: M1=4.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=40 C2={30, 40, 50, 60} Iteration 4: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} Iteration 5: M1=6.4 C1={1, 2, 3, 4, 5, 6, 10, 20} M2=45 C2={30, 40, 50, 60} d) We always break ties in the favor of cluster with the smallest number. (1) (2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) ((1, 2), 3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (((1, 2), 3), 4) (5) (6) (10) (20) (30) (40) (50) (60) ((((1, 2), 3), 4), 5) (6) (10) (20) (30) (40) (50) (60) (((((1, 2), 3), 4), 5), 6) (10) (20) (30) (40) (50) (60) ((((((1, 2), 3), 4), 5), 6), 10) (20) (30) (40) (50) (60) (((((((1, 2), 3), 4), 5), 6), 10), 20) (30) (40) (50) (60) ((((((((1, 2), 3), 4), 5), 6), 10), 20), 30) (40) (50) (60) (((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40) (50) (60) ((((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40), 50) (60) (((((((((((1, 2), 3), 4), 5), 6), 10), 20), 30), 40), 50), 60) e) We always break ties in the favor of cluster with the smallest number. (1) (2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3) (4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3, 4) (5) (6) (10) (20) (30) (40) (50) (60) (1, 2) (3, 4) (5, 6) (10) (20) (30) (40) (50) (60) ((1, 2), (3, 4)) (5, 6) (10) (20) (30) (40) (50) (60) (((1, 2), (3, 4)), (5, 6)) (10) (20) (30) (40) (50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20) (30) (40) (50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) (40) (50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) (40, 50) (60) ((((1, 2), (3, 4)), (5, 6)), 10) (20, 30) ((40, 50), 60) (((((1, 2), (3, 4)), (5, 6)), 10), (20, 30)) ((40, 50), 60) ((((((1, 2), (3, 4)), (5, 6)), 10), (20, 30)), ((40, 50), 60))

Problem 2: a) Entropy Purity Cluster #1 2.0558 22/49 = 0. Cluster #2 2.0396 29/71 = 0. Cluster #3 1.4549 45/66 = 0. Cluster #4 1.4885 8/13 = 0. Overall 1.8137 0. b) Precision(“Compilers”) Cluster #1 2/49 = 0. Cluster #2 3/71 = 0. Cluster #3 1/66 = 0. Cluster #4 8/13 = 0. Overall 0. c) Recall(“Systems”) Cluster #1 7/45 = 0. Cluster #2 23/45 = 0. Cluster #3 12/45 = 0. Cluster #4 3/45=0. Overall 0.3134 (weighted sum) OR 45/45= Problem 3: a) Same as show in Figure 6.32 on page 408 in the text book. Note that the authors have not used the label L10 for any leaf node. b) Leaf node L c) Leaf nodes visited will be L4, L2, L3, L5, L1, L8, L6 and L d) Candidate item sets will be {1, 2, 7} {1, 7, 8} and {2, 7, 8} Problem 4: a) b) Both are false. Consider the following counter example. Let minsup = 0.2 and minconf = 0. Support(A, B, C) = 4/20 Support(A) = 10/20 Confidence(A → B) = 5/10 >= minconf Support(A, B) = 5/20 Confidence(A → BC) = 4/10 < minconf Support(A, C) = 9/20 Confidence(AC → B) = 4/9 < minconf c) False. Consider the following counter example Let minsup = 0.2 and minconf = 0. Support(A, B, C) = 4/20 Support(A) = 10/20 Confidence(AC → B) = 4/5 >= minconf Support(A, B) = 9/20 Support(A, C) = 5/20 Confidence(AB → C) = 4/9 < minconf d) False. Consider the following counter example Let minsup = 0.2 and minconf = 0. Support(A) = 10/20 Support(A, B) = 5/20 Confidence(A → B) = 5/10 >= minconf Support(B) = 12/20 Confidence(B → A) = 5/12 < minconf