







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Special databases, data warehouses, data mining, and information retrieval in the context of database design. Topics include biological data, geographic data, movies, new types of queries, and storing/retrieval issues. Examples are given using r-trees, olap, and data cubes. The document also covers data federation, data warehouses, and data mining techniques such as association rules.
Typology: Study notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Course evaluation:
http://www.CourseEvalUM.umd.edu
Review sessions: Thursday & Monday
e-mail me topics to cover, questions, problems, etc.
Biological data
refinement of “like” queries: find sequences that are
“related”
Spatial/geographic data (GIS)
find all Home Depot stores within 15 miles of Baltimore
find a point in Maryland that's farther than 15 miles from
the nearest Lowes and is densely populated
find all cities within lat/lon square: 39.00 N, 40.00 N,
special/spatial index: R-tree
Query: 1 MSVMYKKILYPTDFSETAEIALKHVKAFKTLKAEEVILLHVIDEREIKKRDIFSLLLGVA 60
M M++K+L+PTDFSE A A++ + ++ EVILLHVIDE +++ L+ G +
Sbjct: 1 MIFMFRKVLFPTDFSEGAYRAVEVFEKRNKMEVGEVILLHVIDEGTLEE-----LMDGYS 55
Binary search tree on Y-coordinate
Each internal node contains search structure on X-coordinate
for all points with Y coordinates in the corresponding
subtree
Cross Tabulation of
Cross Tabulation of
sales
sales
by
by
item-name
item-name
and
and
color
color
n The table above is an example of a cross-tabulation ( cross-tab ), also
referred to as a pivot-table.
Values for one of the dimension attributes form the row headers
Values for another dimension attribute form the column headers
Other dimension attributes are listed on top
Values in individual cells are (aggregates of) the values of the
dimension attributes that specify the cell.
Data Cube
Data Cube
n A data cube is a multidimensional generalization of a cross-tab
n Can have n dimensions; we show 3 below
n Cross-tabs can be used as views on a data cube
Brute-force solution to federation:
download all databases
convert them to a common schema
provide a common interface
Problems:
data storage & duplication
hard to keep up to date
performance (single point of entry/ failure)
Examples:
GenBank (US biological data repository)
Ensembl (EU biological data repository)
Searching for patterns in data
Typically done in data warehouses
n Association Rules:
When a customer buys X, she also typically buys Y
Use?
Move X and Y together in supermarkets
A customer buys a lot of shirts
Send him a catalogue of shirts
Patterns are not always obvious
Classic example: It was observed that men tend to buy
beer and diapers together (may be an urban legend)
Other types of mining
Classification
Decision Trees
Databases for new types of data (e.g. biological or social
networks)
Streaming databases (Comcast OnDemand)
Large amounts of data
Security/Privacy