Relational Operator Evaluation: Joining and Indexing Strategies - Prof. Kristen R. Lefevre, Study notes of Database Management Systems (DBMS)

Various strategies for joining and indexing large relational databases using different algorithms and index types. Topics include selection, projection, join algorithms, and set operations. The document also provides cost analyses for different scenarios and discusses the trade-offs between various indexing strategies.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-z2l
koofers-user-z2l 🇺🇸

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
RelationalOperatorEvaluation
EECS484
Winter09
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Relational Operator Evaluation: Joining and Indexing Strategies - Prof. Kristen R. Lefevre and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

Relational

Operator

Evaluation

EECS

Winter

Basics

Cost

page

I/Os

with

disk

Writing

back

result

common,

so

may

be

ignored

When

to

use

a

B+tree

index

Consider

A

relation

with

1M

tuples

tuples

on

a

page

(key,

rid)

pairs

on

a

page

Clustered

Non-Clustered

1% Selection

10% Selection

# data pages

= 1M/100 = 10K pages

# leaf idx pgs

= 1M / (500 * 0.67) ~ 3K pages

NC + Sort Rids

SELECTION

No

disjunction

File

scan

Single

index

Multiple

index?

– intersect

rids

Disjunctions

present

File

scan

required

for

any

clause?

Multiple

index?

– union

rids

PROJECTION

Hash

based

Hash

relation

into

B

partitions

Read

in

partition

For

each,

re

hash,

eliminate

duplicates,

write

back

Indexes

If

all

attributes

present,

‘index

only’

JOIN

Nested

Loops

join

foreach tuple r in

R

do foreach tuple s in

S

do if r.sid

s.sid then add <r, s> to result Cost

|R|

||R||

|S|

Should

R

be the Smaller/Larger relation?

Page

Nested

Loops

join

|R|

|R|

|S|

JOIN

Sort

Merge

Join

Sort

R

and

S

Increment

pointers

till

matching

tuples

found

Take

care

of

duplicates

Cost

|R|

log

|R|

|S|

log

|S|

(|R|+|S|)

Refinement:

3(|R|+|S|)

JOIN

Hash

Join

Build

hashes

on

R,

S

Read

in

smaller

partition

and

rehash

Probe

other

relation

Cost

3(|R|+|S|)

Sort

vs Hash

Sort

skew

resistant,

result

sorted

If

B

between

sqrt(|R|)

and

sqrt(|S|),

hash

economical

Consider the join

R

JOIN

(R.a=S.b)

S

Ignore the cost of writing out the result. Relation

R

contains

tuples and has

tuples per page. Relation

S

contains

tuples and also has

tuples per page. Attribute b of relation

S

is the primary key for

S.

Both relations are stored as simple heap files. Neither relation has any indexes built on it. 52 buffer pages are available.

What is the cost of joining

R

and

S

using a page

oriented simple nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

What is the cost of joining

R

and

S

using a block nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

What is the cost of joining

R

and

S

using a sort ‐merge join?

What is the cost of joining

R

and

S

using a hash join?

What would be the lowest possible

I/O

cost for joining

R

and

S

using any join algorithm, and how much buffer space would be needed to achieve this cost?

How many tuples does the join of

R

and

S

produce, at most?

Table

‘Executives’

has

attributes

ename,

title,

dname,

and

address;

all

are

string

fields

of

the

same

length.

The

ename

attribute

is

a

candidate

key.

The

relation

contains

pages.

There

are

buffer

pages.

Index

record

is

th

a

data

record

for

single

key

and

a

data

record

for

key

indexes

Consider

the

following

query:

SELECT

E.title,

E.ename

FROM

Executives

E

WHERE

E.title=‘CFO’

Assume

that

only

of

Executives

tuples

meet

the

selection

condition.

What

is

the

cost

of

the

best

plan,

if

you

have

(a)

a

clustered

B+

tree

index

on

title.

(b)

an

unclustered

B+

tree

index

on

title.

(c)

a

clustered

B+

tree

index

on

ename.

(d)

a

clustered

B+

tree

index

on

address.

(e)

a

clustered

B+

tree

index

on

ename,

title