Download Clustering Using Grid-based Method and more Lecture notes Data Mining in PDF only on Docsity!
A presentation on
STING: Statistical Information
Grid (Grid–Based Methods )
By:
Binko Toure
Favour Iwoni
Tara Chandra
Shrestha
Grid-Based Clustering Methods
- (^) The clustering methods discussed so far are data driven: they partition the set of objects and adopt to the distribution of the objects in the embedding space.
- (^) Algorithms are query dependent. They are built for one query and generally no use for other query. We need a separate scan for each query, hence computation complexity at least O(n).
- (^) This method takes a space-driven approach by partitioning the embedding space into cells independent of the distribution of the input objects.
- (^) Uses multi-resolution grid data structure
- (^) Quantizes the object space into a finite number of cells that form a grid structure on which all of the operation for clustering are performed.
- (^) Develop hierarchical structure out of a given data and answer various queries efficiently. Every level of Hierarchy consists of cells.
Features & Challenges of a typical grid-based algorithm
- (^) Efficiency & Scalability : # of cells << # of data points
- (^) Uniformity: Uniform, hard to handle highly irregular data distributions
- (^) Locality: Limited by predefined cell sizes, borders, and the density threshold
- (^) Curse of dimensionality: Hard to cluster high-dimensional data
Advantages of Grid-based Clustering Algorithms
- (^) Fast: (^) No distance computations (^) Clustering is performed on summaries and not individual objects; complexity is usually O(#- populated-grid-cells) and not O(# data objects) (^) Easy to determine which clusters are neighboring
- (^) Shapes are limited to union of grid-cells 5
STING: Algorithm (2) 7 The summarized pseudocodes for the STING algorithm are as follows:
STING: Query Processing(3) Used a top-down approach to answer spatial data queries
- Start from a pre-selected layer—typically with a small number of cells
- From the pre-selected layer until you reach the bottom layer do the following:
- (^) For each cell in the current level compute the
confidence interval indicating a cell’s relevance to a
given query;
- (^) If it is relevant, include the cell in a cluster
- (^) If it irrelevant, remove cell from further consideration
- (^) otherwise, look for relevant cells at the next lower layer
- Combine relevant cells into relevant regions (based on grid-neighborhood) and return the so obtained clusters as your answers.