Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Mining-Information Technology-Lecture Handout, Exercises of Information Technology

Central University of Jammu and Kashmir Information Technology

Main tpoics for the course are mentioned here. What is E-Commerce and its type. Networking Devices. Markup languages. Security issues. Data mining. E-business. Cryptography and public key infrastructure. Electronic Data Exchange. Internet marketing. ERP. This lecture includes: Data, Mining, Interesting, Pattern, Evaluation, Data, Warehouse, Decision, Making, Transform, Integrate, Query

Typology: Exercises

2011/2012

Uploaded on 08/11/2012

duraid 🇮🇳

4.3

(3)

72 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

E

E-

-C

CO

OM

MM

ME

ER

RC

CE

E

–

I

IT

T4

43

30

0

V

VU

U

Lesson 34

DATA MINING

Data Mining can be defined as the task of discovering interesting patterns from large amounts of data,

where the data can be stored in databases, data warehouses, or other information repositories.

Data mining has a lot of business application in today’s world. We can identify the behavior of our

customers and can effectively target them with personalized messages using data mining techniques.

Assume that there is a shopping store where the data/information about customers has been

recorded/stored over a period of time. Using a data mining technique on the customers’ data, certain

pattern can be generated that can provide useful information. For example, this pattern may tell us that

people having a certain demographic profile (age over 20 years and sex male) coming from a particular

location have shown inclination to buy computer related items. It is an interesting clue for the marketers. In

case there is a computer related item that is to be marketed in future, then marketing effort in this behalf

should be focused on such persons instead of sending marketing messages at random. In other words,

persons indicated by the pattern are the ones who are likely to respond to this kind of marketing initiative.

Thus, if a company follows the pattern it can save time, energy and mailing cost.

Data warehouse

A data warehouse is a repository for long-term storage of data from multiple sources, organized so as to

facilitate the management for decision making. Fig. 1 below shows how data collected at different sources is

cleaned, transformed, integrated and loaded in a data warehouse from where it can be accessed by clients

for data mining and pattern evaluation.

Data

warehouse

Clean

Transform

Integrate

Load

Query and

Analysis tools

Client

Data source in Karachi

Data source in

Lahore

Data source in

Islamabad

Data source in Faisalabad

Fig. 1

Knowledge discovery

A knowledge discovery process includes data cleaning, data integration, data selection, data transformation,

data mining, pattern evaluation and knowledge presentation.

Fig. 2 shows the knowledge discovery process:

docsity.com

Discover Exercises of Information Technology Central University of Jammu and Kashmir

Partial preview of the text

Download Data Mining-Information Technology-Lecture Handout and more Exercises Information Technology in PDF only on Docsity!

Lesson 34 DATA MINING

Data Mining can be defined as the task of discovering interesting patterns from large amounts of data, where the data can be stored in databases, data warehouses, or other information repositories. Data mining has a lot of business application in today’s world. We can identify the behavior of our customers and can effectively target them with personalized messages using data mining techniques. Assume that there is a shopping store where the data/information about customers has been recorded/stored over a period of time. Using a data mining technique on the customers’ data, certain pattern can be generated that can provide useful information. For example, this pattern may tell us that people having a certain demographic profile (age over 20 years and sex male) coming from a particular location have shown inclination to buy computer related items. It is an interesting clue for the marketers. In case there is a computer related item that is to be marketed in future, then marketing effort in this behalf should be focused on such persons instead of sending marketing messages at random. In other words, persons indicated by the pattern are the ones who are likely to respond to this kind of marketing initiative. Thus, if a company follows the pattern it can save time, energy and mailing cost.

Data warehouse

A data warehouse is a repository for long-term storage of data from multiple sources, organized so as to facilitate the management for decision making. Fig. 1 below shows how data collected at different sources is cleaned, transformed, integrated and loaded in a data warehouse from where it can be accessed by clients for data mining and pattern evaluation.

Data warehouse

Clean Transform Integrate Load

Query and Analysis tools

Client

Data source in Karachi

Data source in Lahore

Data source in Islamabad

Data source in Faisalabad

Fig. 1 Knowledge discovery

A knowledge discovery process includes data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge presentation. Fig. 2 shows the knowledge discovery process:

Data Warehouse

Databases

Patterns

Data Mining

Knowledge

Cleaning and Integration

Selection and Transformation

Evaluation and Presentation

Fig. 2

Note that data mining is a step in the overall knowledge discovery process. Data must be cleaned, transformed, selected and integrated before data mining is performed. Data cleaning means that missing values should be provided in different fields/columns wherever needed and any impossible or erroneous values should be substituted by correct/reasonable ones. For example if the age of a person is typed as 1000 years in the column ‘age’ then an average age value can be put in its place. Where there are quite a few erroneous or missing values in a row, then that row can be discarded/deleted altogether. This process is called data selection. In data transformation, the data from all different sources is converted into the same format. For example, date typed under a column should be in the same format in the entire data collected through different sources. In data integration, data from all the sources is assembled or integrated into one and housed in the data warehouse. Now, this cleaned, transformed, selected and integrated data is fed to the data mining tool from a data warehouse for data mining purpose. The results/ patterns are evaluated by managers and useful knowledge is thus gained. Note that almost 80% of the total time used in a knowledge discovery process is spent on just making the data fit for mining, that is, data cleaning, data transformation, data selection etc.

Types of Data Mining

There are four main types of data mining as follows: Classification Association Characterization Clustering

Classification and association are predictive types of data mining while characterization and clustering represent the descriptive type.

Classification

It allows you to have a predictive model labeling different samples to different classes. The results of this type of mining/model are represented as (if-then) rules, decision trees, neural networks etc. Two important algorithms used for this type are ID3 Algorithm, and Bayesian classification. Decision tree is a graphical representation of the if-then rules. Fig. 3 below shows the result of classification in the form of a decision tree. Initially, the whole data is divided into two sets – training data and test data.

The decision as to whether or not the same model should be used in the future would depend upon its efficiency. Normally, efficiency of a model close to 80% is considered as a good value.

Association

Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. It is widely used for market basket analysis. For example, where we are recording sales of a big shopping store in databases, then by applying association mining we may discover that certain items have a strong bondage or affinity with each other such that when one item is purchased the other is purchased, too. Apriori algorithm is used for association mining.

Data Mining-Information Technology-Lecture Handout, Exercises of Information Technology

Related documents

Partial preview of the text

Download Data Mining-Information Technology-Lecture Handout and more Exercises Information Technology in PDF only on Docsity!