Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

clustering dataming, Lecture notes of Applications of Computer Sciences

Keele University Applications of Computer Sciences

clustering Computer science dataming

Typology: Lecture notes

2016/2017

Uploaded on 04/15/2017

silm-ahmed 🇬🇧

1 document

1 / 54

This page cannot be seen from the preview

Don't miss anything!

14.11.2001 Data mining: Clusteri

ng

1

Intro/Ass. Rules

Episodes

Text Mining

Home Exam

24./26.10.

30.10.

Clustering

KDD Process

Appl./Summary

14.11.

21.11.

7.11.

28.11.

Course on Data Mining (581550-4)

Discover Lecture notes of Applications of Computer Sciences Keele University

Partial preview of the text

Download clustering dataming and more Lecture notes Applications of Computer Sciences in PDF only on Docsity!

Intro/Ass. Rules

Episodes

Text Mining

Home Exam

Clustering

KDD Process

Appl./Summary

Course on Data Mining (581550-4)

Today 14.11.

Today's subject

Today's subject :

o

Classification, clustering

Next week's program

Next week's program :

o

Lecture: Data mining process

o

Exercise: Classification,

clustering

o

Seminar: Classification,

clustering

Course on Data Mining (581550-4)

What is cluster analysis?

Similarity and dissimilarity

Types of data in cluster analysis

Major clustering methods

Partitioning methods

Hierarchical methods

Outlier analysis

Summary

Cluster analysis

Overview

Cluster:

Cluster: a collection of data

objects

o

similar to one another within

the same cluster

o dissimilar to the objects in the

other clusters

Aim of clustering:

Aim of clustering: to group a set

of data objects into clusters

What is cluster analysis?

Applications of clustering

Marketing:

Marketing: discovering of distinct customer

groups in a purchase database

Land use:

Land use: identifying of areas of similar land use

in an earth observation database

Insurance:

Insurance: identifying groups of motor insurance

policy holders with a high average claim cost

City-planning:

City-planning: identifying groups of houses

according to their house type, value, and

geographical location

What is good clustering?

A

good clustering

good clustering method will produce high quality

clusters with

o

high intra-class

intra-class similarity

o low

inter-class

inter-class similarity

The

quality

quality of a clustering result depends on

o the similarity measure used

o implementation of the similarity measure

The quality

quality of a clustering method is also measured by its

ability to discover some or all of the hidden

hidden patterns

Requirements of clustering in

data mining (2)

Ability to deal with noise and

outliers

Insensitivity to order of input

records

High dimensionality

Incorporation of user-specified

constraints

Interpretability and usability

Similarity and dissimilarity

between objects (1)

There is no single definition

no single definition of

similarity or dissimilarity between data

objects

The definition of similarity or

similarity or

dissimilarity between objects

depends on

o

the type of the data considered

o what kind of similarity we are

looking for

Type of data in cluster analysis

Interval-scaled variables

Binary variables

Nominal, ordinal, and ratio

variables

Variables of mixed types

Complex data types

Interval-scaled variables (1)

Continuous measurements

Continuous measurements of a

roughly linear scale

For example, weight, height and age

The

measurement unit

measurement unit can affect

the cluster analysis

To avoid dependence on the

measurement unit, we should

standardize

standardize the data

Interval-scaled variables (3)

One group of popular distance measures for interval-

scaled variables are Minkowski distances

Minkowski distances

where i = (x

i

, x

i

, …, x

ip

) and j = (x

j

, x

j

, …, x

jp

) are

two p -dimensional data objects, and q is a positive

integer

q

p p

q q

j

x

i

x

j

x

i

x

j

x

i

d ( i , j ) (| x | | | ... | | )

1 1 2 2

Interval-scaled variables (4)

If q = 1 , the distance measure is Manhattan (

Manhattan ( or

city block) distance

If q = 2 , the distance measure is

Euclidean

distance

( , ) | | | | ... | |

1 1 2 2 p jp

x

i

x

j

x

i

x

j

x

i

d i j  x      

2 2

2

1 1 p jp

x

i

x

j

x

i

x

j

x

i

d i j  x      

Binary variables (2)

Simple matching coefficient

Simple matching coefficient (invariant similarity, if

the binary variable is symmetric

symmetric ):

Jaccard coefficient

Jaccard coefficient (noninvariant similarity, if the

binary variable is asymmetric

asymmetric ):

a b c d

b c

d i j

  



( , )

a b c

b c

d i j

 



( , )

Binary variables (3)

Example

Example : dissimilarity between binary variables:

a patient record table

eight attributes, of which

o gender is a symmetric attribute, and

o the remaining attributes are asymmetric binary

Name Gender Fever Cough Test-1 Test-2 Test-3 Test-

Jack M Y N P N N N

Mary F Y N P N P N

Jim M Y P N N N N

clustering dataming, Lecture notes of Applications of Computer Sciences

Related documents

Partial preview of the text

Download clustering dataming and more Lecture notes Applications of Computer Sciences in PDF only on Docsity!

Intro/Ass. Rules

Intro/Ass. Rules

Intro/Ass. Rules

Intro/Ass. Rules

Episodes

Episodes

Episodes

Episodes

Text Mining

Text Mining

Text Mining

Text Mining

Clustering

Clustering

Clustering

Clustering

KDD Process

KDD Process

KDD Process

KDD Process

Appl./Summary

Appl./Summary

Appl./Summary

Appl./Summary

Course on Data Mining (581550-4)

Course on Data Mining (581550-4)

Course on Data Mining (581550-4)

Course on Data Mining (581550-4)

Cluster analysis

Cluster analysis

What is cluster analysis?

What is cluster analysis?

Applications of clustering

Applications of clustering

What is good clustering?

What is good clustering?

A

Requirements of clustering in

Requirements of clustering in

data mining (2)

data mining (2)

Similarity and dissimilarity

Similarity and dissimilarity

between objects (1)

between objects (1)

Type of data in cluster analysis

Type of data in cluster analysis

Interval-scaled variables (1)

Interval-scaled variables (1)

Interval-scaled variables (3)

Interval-scaled variables (3)

j

x

i

x

j

x

i

x

j

x

i

d ( i , j ) (| x | | | ... | | )

Interval-scaled variables (4)

Interval-scaled variables (4)

Binary variables (2)

Binary variables (2)

Binary variables (3)

Binary variables (3)