Advanced Database Systems-Lecture 12 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Query Processing with Indexes, Selection Using Index, Equality Predicate, Range Predicate, Index versus Table Scan, Index Nested-loop Join, Zig-zag Join using Ordered Indexes, Bitmap Index, Projection Index, Bit-sliced Index, Value-list Index, Technicalities, SUM without Any Index, SUM With a Value-list Index, SUM with a Projection Index, SUM With a Bit-sliced Index, Median, Median with a Projection Index, Median With an Ordered Value-list Index, Median with a Bit-sliced Index, Variant Indexes,

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Query Processing with Indexes
CPS 216
Advanced Database Systems
2
Announcements (February 24)
More reading assignment for next week
Buffer management (due next Wednesday)
Homework #2 due next Thursday
Course project proposal due in 1½ weeks
Midterm in two weeks
Christos Faloutsos (CMU) talk
“Data Mining Using Fractals and Power Laws”
4-5pm, Monday, February 28
130A North Building (telecast from UNC)
3
Review
Many different ways of processing the same query
Scan (e.g., nested-loop join)
Sort (e.g., sort-merge join)
Hash (e.g., hash join)
)Index
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Advanced Database Systems-Lecture 12 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing with Indexes

CPS 216

Advanced Database Systems

2

Announcements (February 24)

™ More reading assignment for next week

ƒ Buffer management (due next Wednesday)

™ Homework #2 due next Thursday

™ Course project proposal due in 1½ weeks

™ Midterm in two weeks

™ Christos Faloutsos (CMU) talk

ƒ “Data Mining Using Fractals and Power Laws” ƒ 4-5pm, Monday, February 28 ƒ 130A North Building (telecast from UNC)

3

Review

™ Many different ways of processing the same query

ƒ Scan (e.g., nested-loop join) ƒ Sort (e.g., sort-merge join) ƒ Hash (e.g., hash join) )Index

Selection using index

™ Equality predicate: σ A = v ( R )

ƒ Use an ISAM, B+^ -tree, or hash index on R ( A )

™ Range predicate: σ A > v ( R )

ƒ Use an ordered index (e.g., ISAM or B+^ -tree) on R ( A ) ƒ Hash index is not applicable

™ Indexes other than those on R ( A ) may be useful

ƒ Example: B+^ -tree index on R ( A , B ) ƒ How about B+^ -tree index on R ( B , A )?

5

Index versus table scan

Situations where index clearly wins:

™ Index-only queries which do not require retrieving

actual tuples

ƒ Example: π AA > v ( R ))

™ Primary index clustered according to search key

ƒ One lookup leads to all result tuples in their entirety

6

Index versus table scan (cont’d)

BUT(!):

™ Consider σ A > v ( R ) and a secondary, non-clustered

index on R ( A )

ƒ Need to follow pointers to get the actual result tuples ƒ Say that 20% of R satisfies A > v

  • Could happen even for equality predicates ƒ I/O’s for index-based selection: lookup + 20% | R | ƒ I/O’s for scan-based selection: B ( R ) ƒ Table scan wins if a block contains more than 5 tuples

More indexes ahead!

™ Bitmap index

ƒ Generalized value-list index

™ Projection index

™ Bit-sliced index

11

Search key values × tuples

™ Looks familiar?

ƒ Keywords × documents

1 1 0 … 0 0 0 0 … 0 0 0 1 … 1 0 0 0 … 0 0 0 0 … 0 … … … … …

Tuples

8

10

9

26 108

Search key values

1 means tuple has the particular search key value 0 means otherwise

0 1 2 n – 1

12

Bitmap index

™ Value-list index—stores the matrix by rows

ƒ Traditionally list contains pointers to tuples ƒ B+^ -tree: tuples with same search key values ƒ Inverted list: documents with same keywords

™ If there are not many search key values, and there

are lots of 1’s in each row, pointer list is not space-

efficient

ƒ How about a bitmap? ƒ Still a B+^ -tree, except leaves have a different format

Technicalities

™ How do we go from a bitmap index (0 to n – 1) to

the actual tuple?

) One more level of indirection solves everything

) Or, given a bitmap index, directly calculate the

physical block number and the slot number within

the block for the tuple

™ In either case, certain block/slot may be invalid

ƒ Because of deletion, or variable-length tuples ƒ Keep an existence bitmap: bit set to 1 if tuple exists

14

Bitmap versus traditional value-list

™ Operations on bitmaps are faster than pointer lists

ƒ Bitmap AND: bit-wise AND ƒ Value-list AND: sort-merge join

™ Bitmap is more efficient when the matrix is

sufficiently dense; otherwise, pointer list is more

efficient

ƒ Smaller means more in memory and fewer I/O’s

™ Generalized value-list index: with both bitmap and

pointer list as alternatives

15

TID A B … 0 8 … … 1 8 … … 2 26 … … 3 108 … … … … … … n -1 10 … …

Projection index

™ Just store π A ( R ) and use it as an index!

Could be implicit and not explicitly stored

TID A B … 0 8 … … 1 8 … … 2 26 … … 3 108 … … … … … … n -1 10 … …

Projection index

SUM without any index

™ For each tuple in B f , go fetch the actual tuple, and

add dollar_sales to a running sum

™ I/O’s: number of Sales blocks with B f tuples

ƒ Assuming we fetch them in sorted order

20

SUM with a value-list index

™ Assume a value-list index on Sales ( dollar_sales )

™ Idea: the index stores dollar_sales values and their counts (in a pretty compact form)

™ sum = 0; Scan Sales ( dollar_sales ) index; for each indexed value v with value-list B (^) v : sum += v × count-1-bits( B (^) v AND B (^) f );

™ I/Os: number of blocks taken by the value-list index

™ Bitmaps can possibly speed up AND and reduce the size of the index

21

SUM with a projection index

™ Assume a project index on Sales ( dollar_sales )

™ Idea: merge join B f and the projection index, add

joining tuples’ dollar_sales to a running sum

ƒ Assuming both B (^) f and the index are sorted on TID

™ I/O’s: number of blocks taken by the projection

index

ƒ Compared with a value-list index, the projection index may be more compact (no empty space or pointers), but it does store duplicate dollar_sales values

™ Also: simpler algorithm, fewer CPU operations

SUM with a bit-sliced index

™ Assume a bit-sliced index on Sales ( dollar_sales ), with slices B (^) k – 1 , …, B 1 , B 0

™ sum = 0; for i = 0 to k – 1: sum += 2 i^ × count-1-bits( B (^) i AND B (^) f );

™ I/O’s: number of blocks taken by the bit-sliced index

™ Conceptually a bit-sliced index contains the same information as a projection index ƒ But the bit-sliced index does not keep TID ƒ Bitmap AND is faster

23

Summary of SUM

™ Best: bit-sliced index

ƒ Index is small ƒ B (^) f can be applied fast!

™ Good: projection index

™ Not bad: value-list index

ƒ Full-fledged index carries a bigger overhead

  • The fact that we have counts of values helped
  • But we did not really need values to be ordered

24

MEDIAN

SELECT MEDIAN( dollar_sales )

FROM Sales

WHERE condition ;

™ Same deal: already found B f (a bitmap or a sorted

list of TID’s that point to Sales tuples that satisfy

condition )

™ Need to find the dollar_sales value that is greater

than or equal to ½ × count-1-bits( B f ) dollar_sales

values among B f tuples

MEDIAN with a bit-sliced index

™ median = 0; B (^) current = B (^) f ; // which tuples we are considering sofar = 0; // number of tuples whose values are less // than what we are considering for i = k – 1 to 0: if (sofar + count-1-bits( B (^) current AND NOT( Bi )) · ½ × count-1-bits( B (^) f )): B (^) current = B (^) current AND B (^) i ; sofar += count-1-bits( B (^) current AND NOT( Bi ); median += 2 i ; else: B (^) current = B (^) current AND NOT( B (^) i );

™ I/O’s: still need to scan the entire index

29

Summary of MEDIAN

™ Best: ordered value-list index

ƒ It helps to be ordered!

™ Pretty good: bit-sliced index

ƒ Could beat ordered value-list index if B (^) f is “clustered”

  • Only need to retrieve the corresponding segment

30

More variant indexes

“Improved Query Performance with Variant Indexes,”

by O’Neil and Quass. SIGMOD , 1997

™ MIN/MAX, and range query using bit-sliced index

™ Join indexes for star schema

ƒ Traditional: one for each combination of foreign columns ƒ Bitmap: one for each foreign column

) Precomputed query results (materialized views)?

Variant vs. traditional indexes

™ What is the more glaring problem of these variant

indexes that makes them not as widely applicable as

the B +^ -tree?

™ How did the paper get away with that?