Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Advanced Database Systems-Lecture 12 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

Query Processing with Indexes, Selection Using Index, Equality Predicate, Range Predicate, Index versus Table Scan, Index Nested-loop Join, Zig-zag Join using Ordered Indexes, Bitmap Index, Projection Index, Bit-sliced Index, Value-list Index, Technicalities, SUM without Any Index, SUM With a Value-list Index, SUM with a Projection Index, SUM With a Bit-sliced Index, Median, Median with a Projection Index, Median With an Ordered Value-list Index, Median with a Bit-sliced Index, Variant Indexes,

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

Query Processing with Indexes

CPS 216

Advanced Database Systems

Announcements (February 24)

More reading assignment for next week

Buffer management (due next Wednesday)

Homework #2 due next Thursday

Course project proposal due in 1½ weeks

Midterm in two weeks

Christos Faloutsos (CMU) talk

“Data Mining Using Fractals and Power Laws”

4-5pm, Monday, February 28

130A North Building (telecast from UNC)

Review

Many different ways of processing the same query

Scan (e.g., nested-loop join)

Sort (e.g., sort-merge join)

Hash (e.g., hash join)

)Index

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Advanced Database Systems-Lecture 12 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing with Indexes

CPS 216

Advanced Database Systems

Announcements (February 24)

Homework #2 due next Thursday

Course project proposal due in 1½ weeks

Midterm in two weeks

Christos Faloutsos (CMU) talk

“Data Mining Using Fractals and Power Laws” 4-5pm, Monday, February 28 130A North Building (telecast from UNC)

Review

Many different ways of processing the same query

Scan (e.g., nested-loop join) Sort (e.g., sort-merge join) Hash (e.g., hash join) )Index

Selection using index

Equality predicate: σ A = v ( R )

Use an ISAM, B+^ -tree, or hash index on R ( A )

Range predicate: σ A > v ( R )

Use an ordered index (e.g., ISAM or B+^ -tree) on R ( A ) Hash index is not applicable

Indexes other than those on R ( A ) may be useful

Example: B+^ -tree index on R ( A , B ) How about B+^ -tree index on R ( B , A )?

Index versus table scan

Situations where index clearly wins:

Index-only queries which do not require retrieving

actual tuples

Example: π A (σ A > v ( R ))

Primary index clustered according to search key

One lookup leads to all result tuples in their entirety

Index versus table scan (cont’d)

BUT(!):

Consider σ A > v ( R ) and a secondary, non-clustered

index on R ( A )

Need to follow pointers to get the actual result tuples Say that 20% of R satisfies A > v

Could happen even for equality predicates I/O’s for index-based selection: lookup + 20% | R | I/O’s for scan-based selection: B ( R ) Table scan wins if a block contains more than 5 tuples

More indexes ahead!

Bitmap index

Generalized value-list index

Projection index

Bit-sliced index

Search key values × tuples

Looks familiar?

Keywords × documents

1 1 0 … 0 0 0 0 … 0 0 0 1 … 1 0 0 0 … 0 0 0 0 … 0 … … … … …

Tuples

26 108

Search key values

1 means tuple has the particular search key value 0 means otherwise

0 1 2 n – 1

Bitmap index

Value-list index—stores the matrix by rows

Traditionally list contains pointers to tuples B+^ -tree: tuples with same search key values Inverted list: documents with same keywords

If there are not many search key values, and there

are lots of 1’s in each row, pointer list is not space-

efficient

How about a bitmap? Still a B+^ -tree, except leaves have a different format

Technicalities

How do we go from a bitmap index (0 to n – 1) to

the actual tuple?

) One more level of indirection solves everything

) Or, given a bitmap index, directly calculate the

physical block number and the slot number within

the block for the tuple

In either case, certain block/slot may be invalid

Because of deletion, or variable-length tuples Keep an existence bitmap: bit set to 1 if tuple exists

Bitmap versus traditional value-list

Operations on bitmaps are faster than pointer lists

Bitmap AND: bit-wise AND Value-list AND: sort-merge join

Bitmap is more efficient when the matrix is

sufficiently dense; otherwise, pointer list is more

efficient

Smaller means more in memory and fewer I/O’s

Generalized value-list index: with both bitmap and

pointer list as alternatives

TID A B … 0 8 … … 1 8 … … 2 26 … … 3 108 … … … … … … n -1 10 … …

Projection index

Just store π A ( R ) and use it as an index!

Could be implicit and not explicitly stored

TID A B … 0 8 … … 1 8 … … 2 26 … … 3 108 … … … … … … n -1 10 … …

Projection index

SUM without any index

For each tuple in B f , go fetch the actual tuple, and

add dollar_sales to a running sum

I/O’s: number of Sales blocks with B f tuples

Assuming we fetch them in sorted order

SUM with a value-list index

Assume a value-list index on Sales ( dollar_sales )

Idea: the index stores dollar_sales values and their counts (in a pretty compact form)

sum = 0; Scan Sales ( dollar_sales ) index; for each indexed value v with value-list B (^) v : sum += v × count-1-bits( B (^) v AND B (^) f );

I/Os: number of blocks taken by the value-list index

Bitmaps can possibly speed up AND and reduce the size of the index

SUM with a projection index

Assume a project index on Sales ( dollar_sales )

Idea: merge join B f and the projection index, add

joining tuples’ dollar_sales to a running sum

Assuming both B (^) f and the index are sorted on TID

I/O’s: number of blocks taken by the projection

index

Compared with a value-list index, the projection index may be more compact (no empty space or pointers), but it does store duplicate dollar_sales values

Also: simpler algorithm, fewer CPU operations

SUM with a bit-sliced index

Assume a bit-sliced index on Sales ( dollar_sales ), with slices B (^) k – 1 , …, B 1 , B 0

sum = 0; for i = 0 to k – 1: sum += 2 i^ × count-1-bits( B (^) i AND B (^) f );

I/O’s: number of blocks taken by the bit-sliced index

Conceptually a bit-sliced index contains the same information as a projection index But the bit-sliced index does not keep TID Bitmap AND is faster

Summary of SUM

Best: bit-sliced index

Index is small B (^) f can be applied fast!

Good: projection index

Not bad: value-list index

Full-fledged index carries a bigger overhead

The fact that we have counts of values helped
But we did not really need values to be ordered

MEDIAN

SELECT MEDIAN( dollar_sales )

FROM Sales

WHERE condition ;

Same deal: already found B f (a bitmap or a sorted

list of TID’s that point to Sales tuples that satisfy

condition )

Need to find the dollar_sales value that is greater

than or equal to ½ × count-1-bits( B f ) dollar_sales

values among B f tuples

MEDIAN with a bit-sliced index

median = 0; B (^) current = B (^) f ; // which tuples we are considering sofar = 0; // number of tuples whose values are less // than what we are considering for i = k – 1 to 0: if (sofar + count-1-bits( B (^) current AND NOT( Bi )) · ½ × count-1-bits( B (^) f )): B (^) current = B (^) current AND B (^) i ; sofar += count-1-bits( B (^) current AND NOT( Bi ); median += 2 i ; else: B (^) current = B (^) current AND NOT( B (^) i );

I/O’s: still need to scan the entire index

Summary of MEDIAN

Best: ordered value-list index

It helps to be ordered!

Pretty good: bit-sliced index

Could beat ordered value-list index if B (^) f is “clustered”

Only need to retrieve the corresponding segment

More variant indexes

“Improved Query Performance with Variant Indexes,”

by O’Neil and Quass. SIGMOD , 1997

MIN/MAX, and range query using bit-sliced index

Join indexes for star schema

Traditional: one for each combination of foreign columns Bitmap: one for each foreign column

Advanced Database Systems-Lecture 12 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Advanced Database Systems-Lecture 12 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing with Indexes

CPS 216

Advanced Database Systems

Announcements (February 24)

 More reading assignment for next week

 Homework #2 due next Thursday

 Course project proposal due in 1½ weeks

 Midterm in two weeks

 Christos Faloutsos (CMU) talk

Review

 Many different ways of processing the same query

Selection using index

 Equality predicate: σ A = v ( R )

 Range predicate: σ A > v ( R )

 Indexes other than those on R ( A ) may be useful

Index versus table scan

Situations where index clearly wins:

 Index-only queries which do not require retrieving

actual tuples

 Primary index clustered according to search key

Index versus table scan (cont’d)

BUT(!):

 Consider σ A > v ( R ) and a secondary, non-clustered

index on R ( A )

More indexes ahead!

 Bitmap index

 Projection index

 Bit-sliced index

Search key values × tuples

 Looks familiar?

Bitmap index

 Value-list index—stores the matrix by rows

 If there are not many search key values, and there

are lots of 1’s in each row, pointer list is not space-

efficient

Technicalities

 How do we go from a bitmap index (0 to n – 1) to

the actual tuple?

) One more level of indirection solves everything

) Or, given a bitmap index, directly calculate the

physical block number and the slot number within

the block for the tuple

 In either case, certain block/slot may be invalid

Bitmap versus traditional value-list

 Operations on bitmaps are faster than pointer lists

 Bitmap is more efficient when the matrix is

sufficiently dense; otherwise, pointer list is more

efficient

 Generalized value-list index: with both bitmap and

pointer list as alternatives

Projection index

 Just store π A ( R ) and use it as an index!

SUM without any index

 For each tuple in B f , go fetch the actual tuple, and

add dollar_sales to a running sum

 I/O’s: number of Sales blocks with B f tuples

SUM with a value-list index

SUM with a projection index

 Assume a project index on Sales ( dollar_sales )

 Idea: merge join B f and the projection index, add

joining tuples’ dollar_sales to a running sum

 I/O’s: number of blocks taken by the projection

index

 Also: simpler algorithm, fewer CPU operations

SUM with a bit-sliced index

Summary of SUM

 Best: bit-sliced index

 Good: projection index

 Not bad: value-list index

MEDIAN

SELECT MEDIAN( dollar_sales )

FROM Sales

WHERE condition ;

 Same deal: already found B f (a bitmap or a sorted

list of TID’s that point to Sales tuples that satisfy

condition )

More reading assignment for next week

Homework #2 due next Thursday

Course project proposal due in 1½ weeks

Midterm in two weeks

Christos Faloutsos (CMU) talk

Many different ways of processing the same query

Equality predicate: σ A = v ( R )

Range predicate: σ A > v ( R )

Indexes other than those on R ( A ) may be useful

Index-only queries which do not require retrieving

Primary index clustered according to search key

Consider σ A > v ( R ) and a secondary, non-clustered

Bitmap index

Projection index

Bit-sliced index

Looks familiar?

Value-list index—stores the matrix by rows

If there are not many search key values, and there

How do we go from a bitmap index (0 to n – 1) to

In either case, certain block/slot may be invalid

Operations on bitmaps are faster than pointer lists

Bitmap is more efficient when the matrix is

Generalized value-list index: with both bitmap and

Just store π A ( R ) and use it as an index!

For each tuple in B f , go fetch the actual tuple, and

I/O’s: number of Sales blocks with B f tuples

Assume a project index on Sales ( dollar_sales )

Idea: merge join B f and the projection index, add

I/O’s: number of blocks taken by the projection

Also: simpler algorithm, fewer CPU operations

Best: bit-sliced index

Good: projection index

Not bad: value-list index

Same deal: already found B f (a bitmap or a sorted

Need to find the dollar_sales value that is greater

Best: ordered value-list index

Pretty good: bit-sliced index

MIN/MAX, and range query using bit-sliced index

Join indexes for star schema

What is the more glaring problem of these variant

How did the paper get away with that?