Clustering Using Grid-based Method, Lecture notes of Data Mining

STING: Statistical Information Grid

Typology: Lecture notes

2018/2019

Uploaded on 04/17/2019

2018006347.tara
2018006347.tara 🇳🇵

1 document

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A presentation on
STING: Statistical Information
Grid (Grid–Based Methods )
By:
Binko Toure
Favour Iwoni
Tara Chandra
Shrestha
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Clustering Using Grid-based Method and more Lecture notes Data Mining in PDF only on Docsity!

A presentation on

STING: Statistical Information

Grid (Grid–Based Methods )

By:

Binko Toure

Favour Iwoni

Tara Chandra

Shrestha

Grid-Based Clustering Methods

  • (^) The clustering methods discussed so far are data driven: they partition the set of objects and adopt to the distribution of the objects in the embedding space.
  • (^) Algorithms are query dependent. They are built for one query and generally no use for other query. We need a separate scan for each query, hence computation complexity at least O(n).
  • (^) This method takes a space-driven approach by partitioning the embedding space into cells independent of the distribution of the input objects.
  • (^) Uses multi-resolution grid data structure
  • (^) Quantizes the object space into a finite number of cells that form a grid structure on which all of the operation for clustering are performed.
  • (^) Develop hierarchical structure out of a given data and answer various queries efficiently. Every level of Hierarchy consists of cells.

Features & Challenges of a typical grid-based algorithm

  • (^) Efficiency & Scalability : # of cells << # of data points
  • (^) Uniformity: Uniform, hard to handle highly irregular data distributions
  • (^) Locality: Limited by predefined cell sizes, borders, and the density threshold
  • (^) Curse of dimensionality: Hard to cluster high-dimensional data

Advantages of Grid-based Clustering Algorithms

  • (^) Fast:  (^) No distance computations  (^) Clustering is performed on summaries and not individual objects; complexity is usually O(#- populated-grid-cells) and not O(# data objects)  (^) Easy to determine which clusters are neighboring
  • (^) Shapes are limited to union of grid-cells 5

STING: Algorithm (2) 7 The summarized pseudocodes for the STING algorithm are as follows:

STING: Query Processing(3) Used a top-down approach to answer spatial data queries

  1. Start from a pre-selected layer—typically with a small number of cells
  2. From the pre-selected layer until you reach the bottom layer do the following:
  • (^) For each cell in the current level compute the

confidence interval indicating a cell’s relevance to a

given query;

  • (^) If it is relevant, include the cell in a cluster
  • (^) If it irrelevant, remove cell from further consideration
  • (^) otherwise, look for relevant cells at the next lower layer
  1. Combine relevant cells into relevant regions (based on grid-neighborhood) and return the so obtained clusters as your answers.