Database Design: Special Databases, Data Warehouses, Data Mining and Information Retrieval, Study notes of Principles of Database Management

Special databases, data warehouses, data mining, and information retrieval in the context of database design. Topics include biological data, geographic data, movies, new types of queries, and storing/retrieval issues. Examples are given using r-trees, olap, and data cubes. The document also covers data federation, data warehouses, and data mining techniques such as association rules.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-mqt-2
koofers-user-mqt-2 🇺🇸

10 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC 424 – Database design
Lecture 25
Special databases
Data warehouses
Data mining/Information retrieval
Mihai Pop
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Database Design: Special Databases, Data Warehouses, Data Mining and Information Retrieval and more Study notes Principles of Database Management in PDF only on Docsity!

CMSC 424 – Database design

Lecture 25

Special databases

Data warehouses

Data mining/Information retrieval

Mihai Pop

Admin

Course evaluation:

http://www.CourseEvalUM.umd.edu

Review sessions: Thursday & Monday

e-mail me topics to cover, questions, problems, etc.

Examples

Biological data

refinement of “like” queries: find sequences that are

“related”

Spatial/geographic data (GIS)

find all Home Depot stores within 15 miles of Baltimore

find a point in Maryland that's farther than 15 miles from

the nearest Lowes and is densely populated

find all cities within lat/lon square: 39.00 N, 40.00 N,

76.00W, 77.00W.

special/spatial index: R-tree

Query: 1 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVA 60

M M++K+L+PTDFSE A A++ + ++ EVILLHVIDE +++ L+ G +

Sbjct: 1 MIFMFRKVLFPTDFSEGAYRAVEVFEKRNKMEVGEVILLHVIDEGTLEE-----LMDGYS 55

R-tree (chap. 24)

Binary search tree on Y-coordinate

Each internal node contains search structure on X-coordinate

for all points with Y coordinates in the corresponding

subtree

Cross Tabulation of

Cross Tabulation of

sales

sales

by

by

item-name

item-name

and

and

color

color

n The table above is an example of a cross-tabulation ( cross-tab ), also

referred to as a pivot-table.

 Values for one of the dimension attributes form the row headers

Values for another dimension attribute form the column headers

 Other dimension attributes are listed on top

 Values in individual cells are (aggregates of) the values of the

dimension attributes that specify the cell.

Data Cube

Data Cube

n A data cube is a multidimensional generalization of a cross-tab

n Can have n dimensions; we show 3 below

n Cross-tabs can be used as views on a data cube

Data warehouses

Brute-force solution to federation:

download all databases

convert them to a common schema

provide a common interface

Problems:

data storage & duplication

hard to keep up to date

performance (single point of entry/ failure)

Examples:

GenBank (US biological data repository)

Ensembl (EU biological data repository)

Data Mining

Searching for patterns in data

Typically done in data warehouses

n Association Rules:

When a customer buys X, she also typically buys Y

Use?

Move X and Y together in supermarkets

A customer buys a lot of shirts

Send him a catalogue of shirts

Patterns are not always obvious

Classic example: It was observed that men tend to buy

beer and diapers together (may be an urban legend)

Other types of mining

Classification

Decision Trees

What's next?

Databases for new types of data (e.g. biological or social

networks)

Streaming databases (Comcast OnDemand)

Large amounts of data

Security/Privacy