



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This midterm exam consists of 6 pages, including 1 page of your name and your UID, and. 6 pages of questions. All pages need to be returned ...
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




TID Items T1 a,c,d,e,f T2 c,d,e,f,g T3 e,f,g T4 b,c,f,g T5 a,d,e,f,g
Item a b c d e f g Profit 10 5 40 30 -20 0 -
(a) (10%) Find all the closed frequent itemsets and maximum frequent itemsets.
(b) (10%) Construct the FP-tree. What is the g-projected database?
x y p0 4 3 p1 5 1 p2 6 1 p3 7 3 p4 7 2 p5 6 4 p6 5 4 p7 4 2
(a) (10%) Simulate the K-means algorithm with 2 clusters for 2 iterations. Show the result after each iteration as a table of cluster assignment for each point. Pick p0 and p5 as your initial centroids. If there is a tie in cluster assignment, you can break the tie arbitrarily.
(b) (10%) Suppose the PAM algorithm is applied to the dataset with p0 and p5 as the initial medoids. At the end of the first iteration, suppose we would like to choose p1 and try swapping with p0. Would the swapping bring benefit?
(c) (10%) Which of the following statements about K-Means is (are) correct? (A) K-Means sometimes cannot find the global optimal clustering. (B) K-Means automatically determines the number of clusters. (C) K-Means sometimes cannot converge. (D) K-Means is sensitive to outliers but robust to data points with different densities. (E) K-Means can deal with categorical features.
(d) (10%) Now suppose the DBSCAN algorithm is applied to the dataset. What of the following settings will make p1 a core point, but not density reachable from p7? (A) Eps = 1, MinPts = 1 (B) Eps = 2, MinPts = 2 (C) Eps = 3, MinPts = 3 (D) Eps = 4, MinPts = 4
(e) (10%) Which of the following statements about clustering algorithms is (are) correct? (A) The Manhattan distance between points (− 1 , 2) and (2, −1) is 5. (B) Both K-Means and Agglomerative Hierarchical Clustering algorithms may suffer from convergence at local optima. (C) Agglomerative Hierarchical Clustering and Divisive Hierarchical Clustering can have different time complexity. (D) The K-Medoid algorithm is not suitable for clustering non-spherical (arbitrary shaped) groups of objects. (E) The order of the data records inputted by the user affects the output of BIRCH. (F) The order of the data records inputted by the user affects the output of OPTICS. (G) If two points are density-connected, there exists a point p which is density-reachable from the two points. (H) Grid-based methods for clustering include STING, CLIQUE, etc. whose results depend on the number of data objects.