Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Density-based Clustering: DBSCAN Algorithm and Parameter Selection, Exams of Data Mining

A lecture on density-based clustering, focusing on the DBSCAN algorithm and its parameters. It covers the basic idea of density-based clustering, the definitions of neighborhood and density, core, border, and outlier points, and the DBSCAN algorithm. It also discusses the pros and cons of DBSCAN and the method for determining the parameters Eps and MinPts.

Typology: Exams

2019/2020

Uploaded on 07/02/2020

sampat-aheer
sampat-aheer 🇮🇳

1 document

1 / 16

Toggle sidebar

Related documents


Partial preview of the text

Download Density-based Clustering: DBSCAN Algorithm and Parameter Selection and more Exams Data Mining in PDF only on Docsity! Clustering Lecture 4: Density-based Methods Jing Gao SUNY Buffalo 1 Outline • Basics – Motivation, definition, evaluation • Methods – Partitional – Hierarchical – Density-based – Mixture model – Spectral methods • Advanced topics – Clustering ensemble – Clustering in MapReduce – Semi-supervised clustering, subspace clustering, co-clustering, etc. 2 Core, Border & Outlier Given  and MinPts, categorize the objects into three exclusive groups.  = 1unit, MinPts = 5 Core Border Outlier A point is a core point if it has more than a specified number of points (MinPts) within Eps—These are points that are at the interior of a cluster. A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point. A noise point is any point that is not a core point nor a border point. 5 Example Original Points Point types: core, border and outliers  = 10, MinPts = 4 6 Density-reachability • Directly density-reachable • An object q is directly density-reachable from object p if p is a core object and q is in p’s -neighborhood. q p ε ε • q is directly density-reachable from p • p is not directly density-reachable from q • Density-reachability is asymmetric MinPts = 4 7 DBSCAN Algorithm: Example • Parameter •  = 2 cm • MinPts = 3 for each o  D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 10 DBSCAN Algorithm: Example • Parameter •  = 2 cm • MinPts = 3 for each o  D do if o is not yet classified then if o is a core-object then collect all objects density-reachable from o and assign them to a new cluster. else assign o to NOISE 11 DBSCAN: Sensitive to Parameters MinPts at 4 and Eps at (a) 0.5 and (b) 0.4. Figure 8. DBScan A . ; results for DS1 with a — ee , Figure 9. DBScan results for DS2 with MinPts at 4 and Eps at (a) 5.0, (b) 3.5, and (c) 3.0. (a) (b) (©) 12 When DBSCAN Does NOT Work Well Original Points (MinPts=4, Eps=9.92). (MinPts=4, Eps=9.75) • Cannot handle varying densities • sensitive to parameters—hard to determine the correct set of parameters 15 Take-away Message • The basic idea of density-based clustering • The two important parameters and the definitions of neighborhood and density in DBSCAN • Core, border and outlier points • DBSCAN algorithm • DBSCAN’s pros and cons 16