Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

CS 347: Notes on Database System Design - Parallel Query Processing, Slides of Distributed Database Management Systems

Dhirubhai Ambani Institute of Information and Communication Technology Distributed Database Management Systems

Notes on parallel query processing in database systems. It covers topics such as rule-based fragmentation, localization, optimization, sorting, join algorithms, and privacy-preserving join. The notes also discuss various parallel operations like duplicate elimination and aggregates.

Typology: Slides

2011/2012

Uploaded on 07/16/2012

sambandam 🇮🇳

4.3

(37)

154 documents

1 / 19

This page cannot be seen from the preview

Don't miss anything!

4/11/2012

CS 347 Notes 03 7

Eliminate redundancy

E.g.: in conditions:

(S.A=1) (S.A>5) False

(S.A<10) (S.A<5) S.A<5

CS 347 Notes 03 8

E.g.: Common sub-expressions

S cond cond TS cond T

RR R

CS 347 Notes 03 9

Algebraic rewriting

E.g.: Push conditions down

cond3

cond

cond1 cond2

RS RS

CS 347 Notes 03 10

• After decomposition:

– One or more algebraic query trees

on relations

• Localization:

– Replace relations by corresponding

fragments

CS 347 Notes 03 11

Localization steps

(1) Start with query

(2) Replace relations by fragments

(3) Push : up (use CS245 rules)

,: down

(4) Simplify – eliminate unnecessary

operations

CS 347 Notes 03 12

Notation for fragment

[R: cond]

fragment conditions its tuples satisfy

docsity.com

Discover Slides of Distributed Database Management Systems Dhirubhai Ambani Institute of Information and Communication Technology

Partial preview of the text

Download CS 347: Notes on Database System Design - Parallel Query Processing and more Slides Distributed Database Management Systems in PDF only on Docsity!

CS 347 Notes 03 7

Eliminate redundancy

E.g.: in conditions: (S.A=1)  (S.A>5)  False (S.A<10)  (S.A<5)  S.A<

CS 347 Notes 03 8

E.g.: Common sub-expressions

U U

S cond cond T S cond T

R R R

CS 347 Notes 03 9

Algebraic rewriting

E.g.: Push conditions down

cond

cond1 cond

R S R S

CS 347 Notes 03 10

After decomposition:
- One or more algebraic query trees on relations
Localization:
- Replace relations by corresponding fragments

CS 347 Notes 03 11

Localization steps

(1) Start with query (2) Replace relations by fragments

(3) Push : up (use CS245 rules)

, : down

(4) Simplify – eliminate unnecessary operations

CS 347 Notes 03 12

Notation for fragment

[R: cond]

fragment conditions its tuples satisfy

CS 347 Notes 03 13

Example A

(1) E=

R

CS 347 Notes 03 14

(2) E=

[R 1 : E < 10] [R 2 : E  10]

CS 347 Notes 03 15

E=3 E=

[R 1 : E < 10] [R 2 : E  10]

CS 347 Notes 03 16

E=3 E=

[R 1 : E < 10] [R 2 : E  10]

 Ø

CS 347 Notes 03 17

(4) E=

[R 1 : E < 10]

CS 347 Notes 03 18

Rule 1

C1[R: c2]  C1[R: c1  c2]

[R: False]  Ø

A

B

CS 347 Notes 03 25

Rule 2

[R: C 1 ] [S: C 2 ] 

[R S: C 1  C 2  R.A = S.A]

CS 347 Notes 03 26

 In step 4 of Example B:

[R 1 : A<5] [S 2 : A  5]

[R 1 S 2 : R 1 .A < 5  S 2 .A  5 

R 1 .A = S 2 .A ]

[R 1 S 2 : False]  Ø A

CS 347 Notes 03 27

Localization with derived fragmentation

Example C (2)

R 1 : R2: S 1 :K=R.K S 2 :K=R.K

A<10 A  10 R.A<10 R.A 10

CS 347 Notes 03 28

[R 1 ][S 1 ] [R 1 ][S 2 ] [R 2 ][S 1 ] [R 2 ][S 2 ]

K K^ K^ K

CS 347 Notes 03 29

[R 1 :A<10] S 1 :K=R.K [R 2 :A10] S 2 :K=R.K

 R.A<10  R.A 10

K K

CS 347 Notes 03 30

 In step 4 of Example C:

[R 1 :A<10] [S 2 :K=R.K  R.A10]

 [R 1 S2: R 1 .A<10  S 2 .K=R.K 

R.A 10  R 1 .K= S 2 .K]

 [R 1 S 2 :False ] (K is key of R, R 1 )

 Ø

CS 347 Notes 03 31

[R 1 :A<10] S 1 :K=R.K [R 2 :A10] S 2 :K=R.K

 R.A<10  R.A 10

K K

(4) simplified more: 

K K

R (^1) S 1 R 2 S 2

CS 347 Notes 03 32

Localization with vertical fragmentation

Example D (1) A R 1 (K, A, B) R R 2 (K, C, D)

CS 347 Notes 03 33

(2) A

R 1 R 2

(K, A, B) (K, C, D)

CS 347 Notes 03 34

(3) A

K,A K,A

R 1 R 2

(K, A, B) (K, C, D)

K not really needed

CS 347 Notes 03 35

(4) A

R 1

(K, A, B)

CS 347 Notes 03 36

Rule 3

Given vertical fragmentation of R: R i = Ai (R), Ai  A
Then for any B  A:

B (R) = B [ Ri | B  Ai  Ø ]

CS 347 Notes 03 43

Parallel/distributed sort

Input: (a) relation R on single site/disk (b) R fragmented/partitioned by sort attribute (c) R fragmented/partitioned by other attribute

CS 347 Notes 03 44

Output (a) sorted R on single site/disk (b) fragments/partitions sorted

F 1 F 2 F 3

CS 347 Notes 03 45

Basic sort

R(K,…), sort on K
Fragmented on K Vector: ko, k 1 , … kn

7 3

ko k 1

CS 347 Notes 03 46

Algorithm: each fragment sorted independently
If necessary, ship results

CS 347 Notes 03 47

 Same idea on different

architectures:

Shared nothing:

Shared memory: sorts F1 sorts F

P

M

P

M

Net F 1 F 2

P1 P

F 1 M F^2

CS 347 Notes 03 48

Range partitioning sort

R(K,….), sort on K
R located at one or more site/disk, not fragmented on K

CS 347 Notes 03 49

Algorithm: (a) Range partition on K (b) Basic sort Ra

R’ 1

R’ 2

R’ 3

Local sort

R 1

R 2

R 3

Result

CS 347 Notes 03 50

Selecting a good partition vector

R a R b R c

CS 347 Notes 03 51

Example

Each site sends to coordinator:
- Min sort key
- Max sort key
- Number of tuples
Coordinator computes vector and distributes to sites (also decides # of sites for local sorts)

CS 347 Notes 03 52

Sample scenario: Coordinator receives: SA : Min=5 Max=10 # = 10 tuples SB: Min=7 Max=17 # = 10 tuples

CS 347 Notes 03 53

Sample scenario:

Coordinator receives: SA : Min=5 Max=10 # = 10 tuples SB: Min=7 Max=17 # = 10 tuples

Expected tuples:

ko?

[assuming we want to sort at 2 sites] CS 347 Notes 03 54

Expected tuples:

ko?

[assuming we want to sort at 2 sites]

CS 347 Notes 03 61

Parallel/distributed Join

Input: Relations R, S May or may not be partitioned Output: R S Result at one or more sites

CS 347 Notes 03 62

Partitioned Join (Equi-join)

Ra (^) S 1

S

S 3

R 1

R

R 3

Local join

Result

f(A) f(A)

CS 347 Notes 03 63

Notes:

Same partition function f is used for both R and S (applied to join attribute)
f can be range or hash partitioning
Local join can be of any type (use any CS245 optimization)
Various scheduling options e.g., (a) partition R; partition S; join (b) partition R; build local hash table for R; partition S and join

CS 347 Notes 03 64

More notes:

We already know why part-join works:

R1 R2 R3 S1 S2 S3 R1 S1 R2 S2 R3 S

Useful to give this type of join a name, because we may want to partition data to make partition-join possible (especially in parallel DB system)

 

 

CS 347 Notes 03 65

Even more notes:

Selecting good partition function f very important: - Number of fragments - Hash function - Partition vector

CS 347 Notes 03 66

Good partition vector
- Goal: | Ri |+| Si | the same
- Can use coordinator to select

CS 347 Notes 03 67

Asymmetric fragment + replicate join

Ra (^) S

S

R 1

R 2

R 3

Local join

Result

f partition union

CS 347 Notes 03 68

Notes:

Can use any partition function f for R (even round robin)
Can do any join — not just equi-join e.g.: R S R.A < S.B

CS 347 Notes 03 69

General fragment and replicate join

f partition n copies of each fragment -> 3 fragments

R

CS 347 Notes 03 70

 S is partitioned in similar fashion

Result

All nxm pairings of

R,S fragments

R1 S

R2 S

R3 S

R1 S

R2 S

R3 S

CS 347 Notes 03 71

Notes:

Asymmetric F+R join is special case of general F+R
Asymmetric F+R may be good if S small
Works for non-equi-joins

CS 347 Notes 03 72

Semi-join
- Goal: reduce communication traffic
- R S  (R S) S or

R (S R) or

(R S) (S R)

A (^) A

CS 347 Notes 03 79

Similar comparisons for other semi-joins
Remember: only taking into account transmission cost

CS 347 Notes 03 80

Trick: Encode A S (or A R ) as a bit vector

key in S

<----one bit/possible key------->

CS 347 Notes 03 81

 Three way joins with semi-joins

Goal: R S T

CS 347 Notes 03 82

 Three way joins with semi-joins

Goal: R S T

Option 1: R’ S’ T where R’ = R S; S’ = S T

CS 347 Notes 03 83

 Three way joins with semi-joins

Goal: R S T

Option 1: R’ S’ T where R’ = R S; S’ = S T

Option 2: R’’ S’ T where R’’ = R S’; S’ = S T

CS 347 Notes 03 84

 Many options!  Number of semi-join options is exponential in # of relations in join

CS 347 Notes 03 85

Privacy Preserving Join

Site 1 has R(A,B)
Site 2 has S(A,C)
Want to compute R S
Site 1 should NOT discover any S info not in the join
Site 2 should NOT discover any R info not in the join

R S

site 1 (^) site 2

CS 347 Notes 03 86

Semi-Join Does Not Work

If Site 1 sends A R to Site 2, site 2 leans all keys of R!

A R = (a1, a2, a3, a4)

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

CS 347 Notes 03 87

Fix: Send hashed keys

Site 1 hashes each value of A before sending
Site 2 hashes (same function) its own A values to see what tuples match

A R = (h(a1), h(a2), h(a3), h(a4))

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

Site 2 sees it has h(a1),h(a3)

(a1, c1), (a3, c3) CS 347 Notes 03 88

What is problem?

A R = (h(a1), h(a2), h(a3), h(a4))

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

Site 2 sees it has h(a1),h(a3)

(a1, c1), (a3, c3)

CS 347 Notes 03 89

What is problem?

Dictionary attack! Site 2 takes all keys, a1, a2, a3... and checks if h(a1), h(a2), h(a3) matches what Site 1 sent...

A R = (h(a1), h(a2), h(a3), h(a4))

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

Site 2 sees it has h(a1),h(a3)

(a1, c1), (a3, c3)

CS 347 Notes 03 90

Adversary Model

Honest but Curious
- dictionary attack is possible (cheating is internal and can’t be caught)
- sending incorrect keys not possible (cheater could be caught)

CS 347 Notes 03 97

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

sum (sal) group by dept

dept sal

1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30

dept sal

3 sales 15 4 sales 5 7 sales 10

R a

R b

CS 347 Notes 03 98

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

sum (sal) group by dept

dept sal

1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30

dept sal

3 sales 15 4 sales 5 7 sales 10

dept sum toy 50 mgmt 45

dept sum sales 30

sum

R a

R b

CS 347 Notes 03 99

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

sum (sal) group by dept

R a

R b

less data!

CS 347 Notes 03 100

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

sum (sal) group by dept

R a

R b

dept sum toy 30 toy 20 mgmt 45

dept sum sales 15 sales 15

sum

less data!

CS 347 Notes 03 101

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

sum (sal) group by dept

dept sum toy 50 mgmt 45

dept sum sales 30

sum

R a

R b

dept sum toy 30 toy 20 mgmt 45

dept sum sales 15 sales 15

sum

less

data! Preview: Map Reduce

CS 347 Notes 03 102

data A

data B

data C

CS 347 Notes 03 103

 Enhancements for aggregates

Perform aggregate during partition to reduce data transmitted
Does not work for all aggregate functions… Which ones?

CS 347 Notes 03 104

 Selection

Range or hash partition
Straightforward

 But what about indexes?

CS 347 Notes 03 105

 Indexing

Can think of partition vector as root of distributed index:

ko k 1

Localindexes

Site 1 Site 2 Site 3

CS 347 Notes 03 106

Index on non-partition attribute

Index sites

Tuple sites

ko k 1

CS 347 Notes 03 107

Notes:

If index is not too big, it may be better to keep whole and make copies...
If updates are frequent, can partition update work... (Question: how do we handle split of B-Tree pages?)

CS 347 Notes 03 108

Extensible or linear hashing R f R R R4 <- add

CS 347 Notes 03 115

Summary

As we consider query plans for optimization, we must consider various tricks:

for individual operations
for scheduling operations

CS 347: Notes on Database System Design - Parallel Query Processing, Slides of Distributed Database Management Systems

Related documents

Partial preview of the text

Download CS 347: Notes on Database System Design - Parallel Query Processing and more Slides Distributed Database Management Systems in PDF only on Docsity!

Eliminate redundancy

E.g.: Common sub-expressions

U U

S cond cond T S cond T

R R R

Algebraic rewriting

cond

cond

cond1 cond

R S R S

Localization steps

(3) Push : up (use CS245 rules)

, : down

Notation for fragment

Example A

(1) E=

R

(2) E=

[R 1 : E < 10] [R 2 : E  10]

E=3 E=

[R 1 : E < 10] [R 2 : E  10]

E=3 E=

[R 1 : E < 10] [R 2 : E  10]

 Ø

(4) E=

[R 1 : E < 10]

Rule 1

C1[R: c2]  C1[R: c1  c2]

A

B

Rule 2

[R: C 1 ] [S: C 2 ] 

[R S: C 1  C 2  R.A = S.A]

 In step 4 of Example B:

[R 1 : A<5] [S 2 : A  5]

[R 1 S 2 : R 1 .A < 5  S 2 .A  5 

R 1 .A = S 2 .A ]

Localization with derived fragmentation

R 1 : R2: S 1 :K=R.K S 2 :K=R.K

A<10 A  10 R.A<10 R.A 10

[R 1 ][S 1 ] [R 1 ][S 2 ] [R 2 ][S 1 ] [R 2 ][S 2 ]

[R 1 :A<10] S 1 :K=R.K [R 2 :A10] S 2 :K=R.K

 R.A<10  R.A 10

 In step 4 of Example C:

[R 1 :A<10] [S 2 :K=R.K  R.A10]

 [R 1 S2: R 1 .A<10  S 2 .K=R.K 

R.A 10  R 1 .K= S 2 .K]

[R 1 :A<10] S 1 :K=R.K [R 2 :A10] S 2 :K=R.K

 R.A<10  R.A 10

(4) simplified more: 

(2) A

R 1 R 2

(K, A, B) (K, C, D)

(3) A

K,A K,A

R 1 R 2

(K, A, B) (K, C, D)

(4) A

R 1

(K, A, B)

Rule 3

B (R) = B [ Ri | B  Ai  Ø ]

Parallel/distributed sort

F 1 F 2 F 3

Basic sort

 Same idea on different

architectures:

P

M

P

M

P1 P

F 1 M F^2

Range partitioning sort

R’ 1

R’ 2