Data Mining: Understanding Classification, Association, and Clustering Techniques, Slides of Database Management Systems (DBMS)

An introduction to data mining, a process used to analyze data and discover useful patterns. Topics covered include classification, association, and clustering. Classification involves assigning new items to existing classes using rules, while association rules identify relationships between different items. Clustering refers to grouping data points into distinct subsets. Examples and techniques for each method are provided.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

arunima
arunima 🇮🇳

3

(2)

99 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Mining
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Data Mining: Understanding Classification, Association, and Clustering Techniques and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Data Mining

Data Mining

  • Define Data Mining
  • Classification
  • Association
  • Clustering

Examples of Data Mining

  • A simple example would be of a clothing retail store. A data mining system could be used to list the customers who often buy t-shirts during the Summer season.
  • Another example would be of the urban legend of how Walmart used data mining to find a correlation between customers buying beer and baby diapers. So they put the two aisles close together to increase profits.

Classification

  • If it is given that items in databases are put into classes, a problem arises when a new item wants to be added to the database.
  • The class for the new item is unknown, so other methods have to be used to find the right class for the item to be put in. Rules then come in to solve the problems.

Decision Tree Classifiers

  • Widely used technique for classification.
  • Internal nodes either called functions or predicates
  • Leaf nodes are associated classes.

Example of Decision Tree Classifiers

Functions 

Classes 

Root 

Association

  • An example of an association for beer and diapers would be: Beer => Diapers
  • As already mentioned, the above association just means that customers that buy beer often buy diapers, too.

Association Rules

  • Support—is a measure of what fraction of the population satisfies both the antecedent and the consequent. In other words, in the association below: milk => screwdrivers Higher percentage of the above association happening is worth more attention than lower percentage.

Clustering

  • Clustering refers to finding clusters of points in a given data and grouping them in different subsets.
  • Widely used clustering techniques— Hierarchical clustering, agglomerative clustering, and divisive clustering.

Types of Clustering

  • Hierarchical—clustering that deals with grouping things by importance.
  • Agglomerative—start by building small clusters, then progressively merge into larger clusters.
  • Decisive—begins with whole set and successively divides into smaller clusters.

Other types of mining

  • Text Mining– data mining techniques to textual documents. An example would be how there is a tool to form clusters on pages that users have visited. So if a user supplies a site and defines that he/she wants a site containing the keyword “Japan”, a list of sites that used the keyword “Japan” the most will appear.
  • Data Visualization—helps users to examine large volumes of data, and to detect patterns visually. So instead of seeing problems through text, visual displays can use maps and charts to pinpoint where the problem is with some color coding scheme.

Example of Text Mining

This example shows what happens when a user does a search for “Japan”. The points closer to the center of the circle has more information on Japan. We can think of the points as websites or research articles. Docsity.com