Advanced Database Systems-Lecture 08 Slides-Computer Science, Slides of Database Management Systems (DBMS)

This course covers advanced database management system design principles and techniques. Indexing, Advanced Database System, R-trees, R-tree Lookup, R-tree Insertion, R-tree Insertion, R*-tree, R -tree, Structure of GiST, Key Predicates, Index Operations, Key Compression, GiST Over R-tree, GiST Over RD-tree

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Indexing: Part II
CPS 216
Advanced Database Systems
2
Announcements (February 8)
Homework #1 due today
No class this Thursday (February 10)
Reading assignments this week
Generalized search trees (due next Tuesday)
“The” Google paper (due next Thursday)
3
R-trees
B-tree: balanced hierarchy of 1-d ranges
R-tree: balanced hierarchy of n-d ranges
30
100
120
150
180
(–, 100)[100, )
(–, 30)[30, 100) [100, 120)[120, 150)[150, 180)[180, )
R7
R1
R2
R3
R4
R5
R6
R8
R6R7
R8
R1R2R3R4R5
pf3
pf4
pf5

Partial preview of the text

Download Advanced Database Systems-Lecture 08 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Indexing: Part II

CPS 216

Advanced Database Systems

2

Announcements (February 8)

™ Homework #1 due today

™ No class this Thursday (February 10)

™ Reading assignments this week

ƒ Generalized search trees (due next Tuesday)

ƒ “The” Google paper (due next Thursday)

3

R-trees

™ B-tree: balanced hierarchy of 1-d ranges

™ R-tree: balanced hierarchy of n -d ranges

30

100

120150180

(–∞, 100)[100, ∞)

(–∞, 30)[30, 100) (^) [100, 120)[120, 150)[150, 180)[180, ∞)

R 7

R 1

R 2

R 3

R 4

R 5 R 6

R 8

R 6 R 7

R 8 …

R 1 R 2 R 3 R 4 R 5

R-tree lookup

™ Which ranges contain me?

™ Problem: search may go down many paths

ƒ Because regions may overlap

ƒ No performance guarantee like B-tree

R 7

R 1

R 2

R 3

R 4

R (^6) R 5

R 8

R 6 R 7

R 8 …

R 1 R 2 R 3 R 4 R 5

5

R-tree insertion

Insert R 9 into R-tree

™ Start from the root

™ Pick a region containing R 9 and follow the child pointer

ƒ If none contains R 9 , pick one and grow it to contain R 9 ƒ Pick the one that requires the least enlargement (why?)

R 7

R 1

R 2

R 3

R 4

R (^6) R 5

R 8

R 6 R 7

R 8 …

R 1 R 2 R 3 R 4 R 5

R 9

R 9

R 7 ’

R 7 ’

6

R-tree insertion: split

™ If a node is too full, split

™ Try to minimize the total area of bounding boxes

ƒ Exhaustive: try all possible splits ƒ Quadratic: “seed” with the most wasteful pair; iteratively assign regions with strongest “preference” ƒ Linear: “seed” with distant regions; iteratively assign others as Quadratic

R 7

R 1

R 2

R 3

R 4

R 5 R 6

R 8

R 6 R 7 ’

R 8 …

R 1 R 2 R 3 R 4 R 5

R 9

R 9

Review

™ Tree-structured indexes

ƒ ISAM

ƒ B-tree and variants

ƒ R-tree and variants

ƒ Can we generalize? GiST!

11

Indexing user-defined data types

™ Specialized indexes (ABCDEFG trees…)

ƒ Redundant code: most trees are very similar ƒ Concurrency control and recovery especially tricky to get right

™ Extensible B-trees and R-trees

ƒ Examples: B-trees in Berkeley DB, B- and R-trees in Informix ƒ User-defined compare() function

) GiST (Generalized Search Trees)

ƒ General (covers B-trees, R-trees, etc.) ƒ Easy to extend ƒ Built-in concurrency control and recovery

12

Structure of GiST

Balanced tree of h p , ptr i pairs

™ p is a key predicate that holds for all objects found

below ptr

™ Every node has between kM and M index entries…

ƒ k must be no more than ½ (why?)

™ Except root, which only needs at least two children

™ All leaves are on the same level

) User only needs to define what key predicates are

Defining key predicates

™ boolean Consistent (entry entry , predicate query )

ƒ Return true if an object satisfying query might be found under entry

™ predicate Union (set entries )

ƒ Return a predicate that holds for all objects found under entries

™ real Penalty (entry entry 1, entry entry 2)

ƒ Return a penalty for inserting entry 2 into the subtree rooted at entry 1

™ (set, set) PickSplit (set entries )

ƒ Given M +1 entries, split it into two sets, each of size at least kM

14

Index operations

™ Search

ƒ Just follow pointer whenever Consistent ( ) is true

™ Insert

ƒ Descent tree along least increase in Penalty ( ) ƒ If there is room in leaf, insert there; otherwise split according to PickSplit ( ) ƒ Propagate changes up using Union ( )

™ Delete

ƒ Search for entry and delete it ƒ Propagate changes up using Union ( ) ƒ On underflow

  • If keys are ordered, can borrow/coalesce in B-tree style
  • Otherwise, reinsert stuff in the node and delete the node

15

GiST over R (B

-tree)

™ Logically, keys represent ranges [ x , y )

™ Query: find keys that overlap with [ a , b )

™ Consistent ( entry , [ a , b )): say entry has key [ x , y )

ƒ x < b and y > a , i.e., overlap

™ Union ( entries ): say entries = {[ x i , y i )}

ƒ [min({ x (^) i }), max({ y (^) i }))

™ Penalty ( entry 1 , entry 2 ): say they have keys [ x 1 , y 1 ) and [ x 2 , y 2 )

ƒ max( y 2 – y 1 , 0) + max( x 1 – x 2 , 0), except boundary cases

™ PickSplit ( entries )

ƒ Sort entries and split evenly

™ Plus a special Compare ( entry , entry ) for ordered keys

Next

™ Hash-based indexing

™ Text indexing