Download Density-based Clustering: DBSCAN Algorithm and Parameter Selection and more Exams Data Mining in PDF only on Docsity! Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline • Basics – Motivation, definition, evaluation • Methods – Partitional – Hierarchical – Density-based – Mixture model – Spectral methods • Advanced topics – Clustering ensemble – Clustering in MapReduce – Semi-supervised clustering, subspace clustering, co-clustering, etc. 2 Core, Border & Outlier Given and MinPts, categorize the objects into three exclusive groups. = 1unit, MinPts = 5 Core Border Outlier A point is a core point if it has more than a specified number of points (MinPts) within Eps—These are points that are at the interior of a cluster. A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point. A noise point is any point that is not a core point nor a border point. 5 Example Original Points Point types: core, border and outliers = 10, MinPts = 4 6 Density-reachability • Directly density-reachable • An object q is directly density-reachable from object p if p is a core object and q is in p’s -neighborhood. q p ε ε • q is directly density-reachable from p • p is not directly density-reachable from q • Density-reachability is asymmetric MinPts = 4 7 DBSCAN Algorithm: Example • Parameter • = 2 cm • MinPts = 3 for each o D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 10 DBSCAN Algorithm: Example • Parameter • = 2 cm • MinPts = 3 for each o D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 11 DBSCAN: Sensitive to Parameters
MinPts at 4 and Eps at
(a) 0.5 and (b) 0.4.
Figure 8. DBScan A . ;
results for DS1 with a — ee ,
Figure 9. DBScan
results for DS2 with
MinPts at 4 and Eps at
(a) 5.0, (b) 3.5, and
(c) 3.0.
(a) (b) (©)
12
When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.92). (MinPts=4, Eps=9.75) • Cannot handle varying densities • sensitive to parameters—hard to determine the correct set of parameters 15 Take-away Message • The basic idea of density-based clustering • The two important parameters and the definitions of neighborhood and density in DBSCAN • Core, border and outlier points • DBSCAN algorithm • DBSCAN’s pros and cons 16