Introduction to Data Mining - Chapter 1, Lecture notes of Data Mining

Introduction to Data Mining - Chapter 1

Typology: Lecture notes

2024/2025

Uploaded on 09/30/2025

johnson-angelo
johnson-angelo 🇹🇷

2 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2.1.2 Types of Data Sets
There are many types of data sets, and as the field of data mining develops
and matures, a greater variety of data sets become available for analysis. In
this section, we describe some of the most common types. For convenience,
we have grouped the types of data sets into three groups: record data, graph
based data, and ordered data. These categories do not cover all possibilities
and other groupings are certainly possible.
General Characteristics of Data Sets
Before providing details of specific kinds of data sets, we discuss three char
acteristics that apply to many data sets and have a significant impact on the
data mining techniques that are used: dimensionality, sparsity, and resolution.
Dimensionality Thedimensionality of a data set is the number of attributes
that the objects in the data set possess. Data with a small number of dimen
sions tends to be qualitatively different than moderate or high-dimensional
data. Indeed, the difficulties associated with analyzing high-dimensional data
are sometimes referred to as the curse of dimensionality. Because of this,
an important motivation in preprocessing the data is dimensionality reduc
tion. These issues are discussed in more depth later in this chapter and in
Appendix B.
Sparsity For some data sets, such as those with asymmetric features, most
attributes of an object have values of 0; in many cases, fewer than 1% of
the entries are non-zero. In practical terms, sparsity is an advantage because
usually only the non-zero values need to be stored and manipulated. This
results in significant savings with respect to computation time and storage.
Furthermore, some data mining algorithms work well only for sparse data.
Resolution It is frequently possible to obtain data at different levels of reso
lution, and often the properties of the data are different at different resolutions.
For instance, the surface of the Earth seems very uneven at a resolution of a
29
Chapter 2 Data
pf3