K Means Clustering-Data Warehouse-Lecture Handout, Exercises of Data Warehousing

Topics include in this course are Data Warehousing Concepts, Design and Development, Extraction, Transformation and Loading, OLAP Technology, Data Mining Techniques: Classification, Clustering and Decision Tree, Advanced Topics. This lecture handout includes: Means, Clustering, Unsupervised, Learning, Machine, Software, Evaluate, Algorithm, Objects, Malignant, Benign

Typology: Exercises

2011/2012

Uploaded on 08/08/2012

sharib_sweet
sharib_sweet šŸ‡®šŸ‡³

4.2

(50)

102 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
K-Means Clustering Example
1
K-Means Clustering – Example
We recall from the previous lecture, that clustering allows for unsupervised learning.
That is, the machine / software will learn on its own, using the data (learning set), and
will classify the objects into a particular class – for example, if our class (decision)
attribute is tumorType and its values are: malignant, benign, etc. - these will be the
classes. They will be represented by cluster1, cluster2, etc. However, the class
information is never provided to the algorithm. The class information can be used later
on, to evaluate how accurately the algorithm classified the objects.
(learning set)
Curvature
Texture
Blood
Consump
Tumor
Type
x1
0.8
1.2
A
Benign
x2
0.75
1.4
B
Benign
x3
0.23
0.4
D
Malignant
x4
.
.
0.23
0.5
D
Malignant
Curvature
Texture
Blood
Consump
Tumor
Type
x1
0.8
1.2
A
Benign
x2
0.75
1.4
B
Benign
x3
0.23
0.4
D
Malignant
x4
.
.
0.23
0.5
D
Malignant
Texture
Blood
Consump
0.8
0.23
1.2
0.4
A
B
D
.x1
The way we do that, is by plotting the
objects from the database into space.
Each attribute is one dimension:
After all the objects are plotted, we
will calculate the distance between
them, and the ones that are close to
each other – we will group them
together, i.e. place them in the same
cluster.
.
Texture
Blood
Consump
0.8
0.23
1.2
0.4
A
B
D
.
.
.
.
.
.
.
Cluster 1
benign
Cluster 2
malignant
docsity.com
pf2

Partial preview of the text

Download K Means Clustering-Data Warehouse-Lecture Handout and more Exercises Data Warehousing in PDF only on Docsity!

K-Means Clustering Example

K-Means Clustering – Example

We recall from the previous lecture, that clustering allows for unsupervised learning.

That is, the machine / software will learn on its own, using the data (learning set), and

will classify the objects into a particular class – for example, if our class (decision)

attribute is tumorType and its values are: malignant, benign, etc. - these will be the

classes. They will be represented by cluster1, cluster2, etc. However, the class

information is never provided to the algorithm. The class information can be used later

on, to evaluate how accurately the algorithm classified the objects.

(learning set)

Curvature Texture Blood Consump Tumor Type x1 0.8 1.2 A Benign x2 0.75 1.4 B Benign x3 0.23 0.4 D Malignant x . . 0.23 0.5 D Malignant Curvature Texture Blood Consump Tumor Type x1 0.8 1.2 A Benign x2 0.75 1.4 B Benign x3 0.23 0.4 D Malignant x . . 0.23 0.5 D Malignant

Curvature

Texture

Blood

Consump

A

B

D

.x

The way we do that, is by plotting the

objects from the database into space.

Each attribute is one dimension:

After all the objects are plotted, we

will calculate the distance between

them, and the ones that are close to

each other – we will group them

together, i.e. place them in the same

cluster.

Curvature

Texture

Blood

Consump

A

B

D

Cluster 1 benign Cluster 2 malignant docsity.com

K-Means Clustering Example

With the K-Means algorithm, we recall it works as follows:

Ā© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 K-means Clustering

 Partitional clustering approach

 Each cluster is associated with a centroid (center point)

 Each point is assigned to the cluster with the closest centroid

 Number of clusters, K, must be specified (is predetermined)

 The basic algorithm is very simple

Ā© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 K-means Clustering – Details

 Initial centroids are often chosen randomly.

  • Clusters produced vary from one run to another.

 The centroid is (typically) the mean of the points in the

cluster.

 ā€˜Closeness’ is measured by Euclidean distance, cosine

similarity, correlation, etc. (the distance measure / function

will be specified)

 K-Means will converge (centroids move at each iteration).

Most of the convergence happens in the first few

iterations.

  • Often the stopping condition is changed to ā€˜Until relatively few points change clusters’ . . . docsity.com