Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Mining Techniques: Confidence, Association, Characterization, Clustering, and OLAP, Exercises of Information Technology

Central University of Jammu and Kashmir Information Technology

An overview of various data mining techniques, including confidence and support measurement, association mining, characterization, clustering, and online analytical processing (olap). The mathematical expressions for confidence and support, the use of association mining for discovering relationships between items, the application of characterization for discovering generalized concepts, the role of clustering as a preprocessing step, and the functionality of olap for presenting data at different levels of abstraction. Examples and algorithms are included for each technique.

Typology: Exercises

2011/2012

Uploaded on 08/11/2012

duraid 🇮🇳

4.3

(3)

72 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

E

E-

-C

CO

OM

MM

ME

ER

RC

CE

E

–

I

IT

T4

43

30

0

V

VU

U

Lesson 35

CONFIDENCE AND SUPPORT

There are two terms/measures used in association, that is, support and confidence. Confidence’ is a

measure of how often the relationship holds true e.g, what percentage of time did people who bought milk

also bought eggs. Support means what is the percentage of two items occurring together overall.

Mathematically, they can be expressed as follows if we take the example of eggs and milk:

Confidence = Transactions (eggs+milk)

Transactions (eggs or milk or both)

In case no. of transactions involving eggs and milk are 25 and those involving eggs or milk or both are 75

then confidence is 25/75*100=33.3%

Support = Transactions (eggs+milk)

Total no. of transactions

In case no. of transactions involving eggs and milk are 10 and total no. of transactions in a day are 50 then

support is 10/50*100 = 20%

Suppose if confidence is 90% but the support is 5%., then we can gather from this that the two items have

very strong affinity or relationship with each other such that when an item is sold the other is sold together,

however, the chance of this pair being purchased out of the total no. of transactions is very slim, just 5%.

One can adjust these measures to discover items having corresponding level of association and accordingly

set marketing strategy. So, if I feed the data to the association mining tool and specify the percentage of

confidence and support, it will list down the items that have association corresponding to these percentages.

Results of association mining are shown with the help of double arrows as indicated below:

Bread Å----Æ Butter

Computer Å----Æ Furniture

Clothes Å----Æ Shoes

Using the result of association mining, a marketer can take a number of useful steps to set or modify

marketing strategy. For example, items that have closeness/affinity with each other can be shelved together

to improve customer service. Certain promotional schemes can be introduced in view of the association

mining result etc.

Characterization

It is discovering interesting concepts in concise and succinct terms at generalized levels for examining the

general behavior of the data. For example, in a database of graduate students of a university the students of

different nationalities can be enrolled in different departments such as music history, physics etc. We can

apply characterization technique to find a generalized concept/answer in response to the question that how

many students of a particular country are studying science or arts. See the following example:

Student name Department City of residence

Imran History Karachi

Alice Physics London

Ali Literature Lahore

Bob Mathematics Toronto

…

In the above example, characterization tool can, for that matter, tell us that 02 Pakistani students are

studying arts. Note that the concept of location and the field of education are generalized to Pakistan and

arts, respectively.

The two algorithms used in characterization are Version Space Search and Attribute-Oriented Induction.

docsity.com

Discover Exercises of Information Technology Central University of Jammu and Kashmir

Partial preview of the text

Download Data Mining Techniques: Confidence, Association, Characterization, Clustering, and OLAP and more Exercises Information Technology in PDF only on Docsity!

Lesson 35 CONFIDENCE AND SUPPORT

There are two terms/measures used in association, that is, support and confidence. Confidence’ is a measure of how often the relationship holds true e.g, what percentage of time did people who bought milk also bought eggs. Support means what is the percentage of two items occurring together overall. Mathematically, they can be expressed as follows if we take the example of eggs and milk:

Confidence = Transactions (eggs+milk) Transactions (eggs or milk or both)

In case no. of transactions involving eggs and milk are 25 and those involving eggs or milk or both are 75 then confidence is 25/75*100=33.3%

Support = Transactions (eggs+milk) Total no. of transactions

In case no. of transactions involving eggs and milk are 10 and total no. of transactions in a day are 50 then support is 10/50*100 = 20%

Suppose if confidence is 90% but the support is 5%., then we can gather from this that the two items have very strong affinity or relationship with each other such that when an item is sold the other is sold together, however, the chance of this pair being purchased out of the total no. of transactions is very slim, just 5%. One can adjust these measures to discover items having corresponding level of association and accordingly set marketing strategy. So, if I feed the data to the association mining tool and specify the percentage of confidence and support, it will list down the items that have association corresponding to these percentages. Results of association mining are shown with the help of double arrows as indicated below:

Bread Å----Æ Butter Computer Å----Æ Furniture Clothes Å----Æ Shoes

Using the result of association mining, a marketer can take a number of useful steps to set or modify marketing strategy. For example, items that have closeness/affinity with each other can be shelved together to improve customer service. Certain promotional schemes can be introduced in view of the association mining result etc.

Characterization

It is discovering interesting concepts in concise and succinct terms at generalized levels for examining the general behavior of the data. For example, in a database of graduate students of a university the students of different nationalities can be enrolled in different departments such as music history, physics etc. We can apply characterization technique to find a generalized concept/answer in response to the question that how many students of a particular country are studying science or arts. See the following example:

Student name Department City of residence Imran History Karachi Alice Physics London Ali Literature Lahore Bob Mathematics Toronto … In the above example, characterization tool can, for that matter, tell us that 02 Pakistani students are studying arts. Note that the concept of location and the field of education are generalized to Pakistan and arts, respectively.

The two algorithms used in characterization are Version Space Search and Attribute-Oriented Induction.

docsity.com

Clustering

A cluster is a group of data objects that are similar to another within the same cluster and are dissimilar to the objects in other clusters. For example, clusters of distinct group of customers, categories of emails in a mailing list database, different categories of web usage from log files etc. It serves as a preprocessing step for other algorithms such as classification and characterization. K-means algorithm is normally used in clustering. In the example below you can see four clusters of customers based on their income level. K- means algorithm displays the result in the format as shown in Fig. 1 below:

Income<1,00,000 Income<1,00,

Income>2,00, <=3,50,

Income>3,50,000Income>3,50,

Income>=1,00, <=2,00,

Fig. 1

Online Analytical Processing (OLAP)

OLAP makes use of background knowledge regarding the domain of the data being studied in order to allow the presentation of data at different levels of abstraction. It is different form data mining in the sense that it does not provide any patterns for making predictions; rather the information stored in databases can be presented/ viewed in a convenient format in case of OLAP at different levels that facilitates decision makers or managers. The result of OLAP is displayed in the form of a data cube as shown in Fig. 2 below:

Data Cube in OLAPData Cube in OLAP

605 825 400

Furniture computer

phone

Grocery

Q

Q Q

Time Quarters

(Item Types)

Lahore

Location (cities) Karachi^440345

Fig. 2

Data Mining Techniques: Confidence, Association, Characterization, Clustering, and OLAP, Exercises of Information Technology

Related documents

Partial preview of the text

Download Data Mining Techniques: Confidence, Association, Characterization, Clustering, and OLAP and more Exercises Information Technology in PDF only on Docsity!

docsity.com

Data Cube in OLAPData Cube in OLAP

docsity.com