Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Mining Primitives - Data Warehousing - Lecture Slide, Slides of Data Warehousing

University of Lucknow Data Warehousing

Some concept of Data Warehousing are Aggregate Functions, Applications and Trends in Data Mining, Classification and Prediction, Cluster Analysis, Data Mining Primitives, Data Warehousing Design. Main points of this lecture are: Data Mining Primitives, Mining Query Language, Design Graphical, Interfaces Based, Query Language, Mining Systems, Architecture of Data, Summary, Patterns Autonomously, Database

Typology: Slides

2012/2013

Uploaded on 04/25/2013

khushia 🇮🇳

4.1

(10)

110 documents

1 / 41

This page cannot be seen from the preview

Don't miss anything!

Data Mining Primitives,

Languages, and System

Architectures

Docsity.com

Discover Slides of Data Warehousing University of Lucknow

Partial preview of the text

Download Data Mining Primitives - Data Warehousing - Lecture Slide and more Slides Data Warehousing in PDF only on Docsity!

Data Mining Primitives,

Languages, and System

Architectures

Chapter 4: Data Mining Primitives,

Languages, and System Architectures

 Data mining primitives: What defines a data

mining task?

 A data mining query language

 Design graphical user interfaces based on a

data mining query language

 Architecture of data mining systems

 Summary

What Defines a Data Mining Task?

 Task-relevant data

Typically interested in only a subset of the entire database
Specify  the name of database/data warehouse (AllElectronics_db)  names of tables/data cubes containing relevant data (item, customer, purchases, items_sold)  conditions for selecting the relevant data (purchases made in Canada for relevant year)  relevant attributes or dimensions (name and price from item, income and age from customer)

What Defines a Data Mining Task?

(continued)

 Type of knowledge to be mined

Concept description, association, classification, prediction, clustering, and evolution analysis  Studying buying habits of customers, mine associations between customer profile and the items they like to buy - Use this info to recommend items to put on sale to increase revenue  Studying real estate transactions, mine clusters to determine house characteristics that make for fast sales - Use this info to make recommendations to house sellers who want/need to sell their house quickly  Study relationship between individual’s sport statistics and salary - Use this info to help sports agents and sports team owners negotiate an individual’s salary

What Defines a Data Mining Task?

 Task-relevant data

 Type of knowledge to be mined

 Background knowledge

 Pattern interestingness measurements

 Visualization of discovered patterns

Task-Relevant Data (Minable View)

 Database or data warehouse name

 Database tables or data warehouse cubes

 Condition for data selection

 Relevant attributes or dimensions

 Data grouping criteria

Background Knowledge:

Concept Hierarchies

 Allow discovery of knowledge at multiple levels of abstraction

 Represented as a set of nodes organized in a tree

Each node represents a concept
Special node, all, reserved for root of tree

 Concept hierarchies allow raw data to be handled at a higher, more generalized level of abstraction

 Four major types of concept hierarchies, schema, set- grouping, operation derived, rule based

A Concept Hierarchy: Dimension

(location)

Mexico

all

Europe North_America

Germany Spain Canada

Vancouver

L. Chan M. Wind

all

region

office

country

city Frankfurt Toronto

Define a sequence of mappings from a set of low

level concepts to higher-level, more general concepts

Background Knowledge:

Concept Hierarchies

 Operation-derived hierarchy – based on

operations specified by users, experts, or the

data mining system

email address or a URL contains hierarchy info relating departments, universities (or companies) and countries
E-mail address  [email protected]
Partial concept hierarchy  login-name < department < university < country

Background Knowledge:

Concept Hierarchies

 Rule-based hierarchy – either a whole concept hierarchy or a portion of it is defined by a set of rules and is evaluated dynamically based on the current data and rule definition

Following rules used to categorize items as low profit margin, medium profit margin and high profit margin  Low profit margin - < $  Medium profit margin – between $50 & $  High profit margin - > $
Rule based concept hierarchy  low_profit_margin (X) <= price(X, P1) and cost (X, P2) and (P1 - P2) < $  medium_profit_margin (X) <= price(X, P1) and cost (X, P2) and (P1 - P2) >= $50 and (P1 – P2) <= $  high_profit_margin (X) <= price(X, P1) and cost (X, P2) and (P1 - P2) > $

Measurements of Pattern

Interestingness (continued)

 Simplicity – A factor contributing to interestingness of

pattern is overall simplicity for comprehension

Objective measures viewed as functions of the pattern structure or number of attributes or operators
More complex a rule, more difficult it is to interpret, thus less interesting
Example measures: rule length or number of leaves in a decision tree

 Certainty – Measure of certainty associated with

pattern that assesses validity or trustworthiness

Confidence (A=>B) = # tuples containing both A & B/ #tuples containing A
Confidence of 85% for association rule buys (X, computer) => buys (X, software) means 85% of all customers who bought a computer bought software also

Measurements of Pattern

Interestingness (continued)

 Utility – potential usefulness of a pattern is a

factor determining its interestingness

Estimated by a utility function such as support – percentage of task relevant data tuples for which pattern is true  Support (A=>B) = # tuples containing both A & B/ total # of tuples

 Novelty – those patterns that contribute new

information or increased performance to the

pattern set

not previously known, surprising

A Data Mining Query

Language (DMQL)

 Motivation

A DMQL can provide the ability to support ad-hoc and interactive data mining
By providing a standardized language like SQL  Hope to achieve a similar effect like that SQL has on relational database  Foundation for system development and evolution  Facilitate information exchange, technology transfer, commercialization and wide acceptance

 Design

DMQL is designed with the primitives described earlier

Syntax for DMQL

 Syntax for specification of

task-relevant data
the kind of knowledge to be mined
concept hierarchy specification
interestingness measure
pattern presentation and visualization

Data Mining Primitives - Data Warehousing - Lecture Slide, Slides of Data Warehousing

Related documents

Partial preview of the text

Download Data Mining Primitives - Data Warehousing - Lecture Slide and more Slides Data Warehousing in PDF only on Docsity!

Data Mining Primitives,

Languages, and System

Architectures

Chapter 4: Data Mining Primitives,

Languages, and System Architectures

 Data mining primitives: What defines a data

mining task?

 A data mining query language

 Design graphical user interfaces based on a

data mining query language

 Architecture of data mining systems

 Summary

What Defines a Data Mining Task?

 Task-relevant data

What Defines a Data Mining Task?

(continued)

 Type of knowledge to be mined

What Defines a Data Mining Task?

 Task-relevant data

 Type of knowledge to be mined

 Background knowledge

 Pattern interestingness measurements

 Visualization of discovered patterns

Task-Relevant Data (Minable View)

Background Knowledge:

Concept Hierarchies

Define a sequence of mappings from a set of low

level concepts to higher-level, more general concepts

Background Knowledge:

Concept Hierarchies

 Operation-derived hierarchy – based on

operations specified by users, experts, or the

data mining system

Background Knowledge:

Concept Hierarchies

Measurements of Pattern

Interestingness (continued)

Measurements of Pattern

Interestingness (continued)

 Utility – potential usefulness of a pattern is a

factor determining its interestingness

 Novelty – those patterns that contribute new

information or increased performance to the

pattern set

A Data Mining Query

Language (DMQL)

Syntax for DMQL

 Syntax for specification of

 Putting it all together — a DMQL query