














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture slide is very easy to understand and very helpful to built a concept about the foundation of computers and Database Design.The key points in these slides are:What Is Data Mining, Principle of Sorting, Knowledge Discovery from Data, Pattern Analysis, Data Archeology, Data Dredging, Information Harvesting, Business Intelligence, Nonoperational Data, Data Warehouses, Data Cleaning
Typology: Slides
1 / 22
This page cannot be seen from the preview
Don't miss anything!















Data mining is the principle of sorting through large amounts of data and picking out relevant information.
In other words…
potentially useful) patterns or knowledge from huge amount of data
Data Mining process
Knowledge Discovery (KDD) Process
Data mining—core of knowledge discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Data Mining: Confluence of Multiple
Disciplines
Data Mining
Concept/Class Description: Characterization
and Discrimination
should be able to produce a description
summarizing the characteristics of customers.
who spend more than $1000 a year at (some
store called ) AllElectronics. The result can be
a general profile such as age, employment
status or credit ratings.
Characterization and Discrimination
continued…
general features of targeting class data objects
with the general features of objects from one or a
set of contrasting classes. User can specify target and contrasting classes.
general features of software products whose
sales increased by 10% in the last year with those
whose sales decreased by about 30% in the same
duration.
Mining Frequent Patterns, Associations and
correlations
buys(X, “CD Player”) [Support = 2%,
confidence = 60% ]
with an income $20000-$29000. There is 60%
chance they will purchase CD Player and 2%
of all the transactions under analysis showed
that this age group customers with that range
of income bought CD Player.
that describes and distinguishes data classes
or concepts for the purpose of being able to
use the model to predict the class of objects
whose class label is unknown.
various forms such as
» IF-THEN Rules » A decision tree » Neural network
consulting a known class label.
AllElectronics customer data in order to identify
homogeneous subpopulations of customers.
These clusters may represent individual target
groups for marketing. The figure on next slide
shows a 2-D plot of customers with respect to
customer locations in a city.