CS 347: Notes on Database System Design - Parallel Query Processing, Slides of Distributed Database Management Systems

Notes on parallel query processing in database systems. It covers topics such as rule-based fragmentation, localization, optimization, sorting, join algorithms, and privacy-preserving join. The notes also discuss various parallel operations like duplicate elimination and aggregates.

Typology: Slides

2011/2012

Uploaded on 07/16/2012

sambandam
sambandam 🇮🇳

4.3

(37)

154 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
4/11/2012
2
CS 347 Notes 03 7
Eliminate redundancy
E.g.: in conditions:
(S.A=1) (S.A>5) False
(S.A<10) (S.A<5) S.A<5
CS 347 Notes 03 8
E.g.: Common sub-expressions
UU
S cond cond TS cond T
RR R
CS 347 Notes 03 9
Algebraic rewriting
E.g.: Push conditions down
cond3
cond
cond1 cond2
RS RS
CS 347 Notes 03 10
After decomposition:
One or more algebraic query trees
on relations
Localization:
Replace relations by corresponding
fragments
CS 347 Notes 03 11
Localization steps
(1) Start with query
(2) Replace relations by fragments
(3) Push : up (use CS245 rules)
,: down
(4) Simplify – eliminate unnecessary
operations
CS 347 Notes 03 12
Notation for fragment
[R: cond]
fragment conditions its tuples satisfy
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download CS 347: Notes on Database System Design - Parallel Query Processing and more Slides Distributed Database Management Systems in PDF only on Docsity!

CS 347 Notes 03 7

Eliminate redundancy

E.g.: in conditions: (S.A=1)  (S.A>5)  False (S.A<10)  (S.A<5)  S.A<

CS 347 Notes 03 8

E.g.: Common sub-expressions

U U

S cond cond T S cond T

R R R

CS 347 Notes 03 9

Algebraic rewriting

E.g.: Push conditions down

cond

cond

cond1 cond

R S R S

CS 347 Notes 03 10

  • After decomposition:
    • One or more algebraic query trees on relations
  • Localization:
    • Replace relations by corresponding fragments

CS 347 Notes 03 11

Localization steps

(1) Start with query (2) Replace relations by fragments

(3) Push : up (use CS245 rules)

, : down

(4) Simplify – eliminate unnecessary operations

CS 347 Notes 03 12

Notation for fragment

[R: cond]

fragment conditions its tuples satisfy

CS 347 Notes 03 13

Example A

(1) E=

R

CS 347 Notes 03 14

(2) E=

[R 1 : E < 10] [R 2 : E  10]

CS 347 Notes 03 15

E=3 E=

[R 1 : E < 10] [R 2 : E  10]

CS 347 Notes 03 16

E=3 E=

[R 1 : E < 10] [R 2 : E  10]

 Ø

CS 347 Notes 03 17

(4) E=

[R 1 : E < 10]

CS 347 Notes 03 18

Rule 1

C1[R: c2]  C1[R: c1  c2]

[R: False]  Ø

A

B

5

CS 347 Notes 03 25

Rule 2

[R: C 1 ] [S: C 2 ] 

[R S: C 1  C 2  R.A = S.A]

A

A

CS 347 Notes 03 26

 In step 4 of Example B:

[R 1 : A<5] [S 2 : A  5]

[R 1 S 2 : R 1 .A < 5  S 2 .A  5 

R 1 .A = S 2 .A ]

[R 1 S 2 : False]  Ø A

A

A

CS 347 Notes 03 27

Localization with derived fragmentation

Example C (2)

R 1 : R2: S 1 :K=R.K S 2 :K=R.K

A<10 A  10 R.A<10 R.A 10

K

CS 347 Notes 03 28

[R 1 ][S 1 ] [R 1 ][S 2 ] [R 2 ][S 1 ] [R 2 ][S 2 ]

K K^ K^ K

CS 347 Notes 03 29

[R 1 :A<10] S 1 :K=R.K [R 2 :A10] S 2 :K=R.K

 R.A<10  R.A 10

K K

CS 347 Notes 03 30

 In step 4 of Example C:

[R 1 :A<10] [S 2 :K=R.K  R.A10]

 [R 1 S2: R 1 .A<10  S 2 .K=R.K 

R.A 10  R 1 .K= S 2 .K]

 [R 1 S 2 :False ] (K is key of R, R 1 )

 Ø

K

K

K

6

CS 347 Notes 03 31

[R 1 :A<10] S 1 :K=R.K [R 2 :A10] S 2 :K=R.K

 R.A<10  R.A 10

K K

(4) simplified more: 

K K

R (^1) S 1 R 2 S 2

CS 347 Notes 03 32

  • Localization with vertical fragmentation

Example D (1) A R 1 (K, A, B) R R 2 (K, C, D)

CS 347 Notes 03 33

(2) A

R 1 R 2

(K, A, B) (K, C, D)

K

CS 347 Notes 03 34

(3) A

K,A K,A

R 1 R 2

(K, A, B) (K, C, D)

K not really needed

CS 347 Notes 03 35

(4) A

R 1

(K, A, B)

CS 347 Notes 03 36

Rule 3

  • Given vertical fragmentation of R: R i = Ai (R), Ai  A
  • Then for any B  A:

B (R) = B [ Ri | B  Ai  Ø ]

i

8

CS 347 Notes 03 43

Parallel/distributed sort

Input: (a) relation R on single site/disk (b) R fragmented/partitioned by sort attribute (c) R fragmented/partitioned by other attribute

CS 347 Notes 03 44

Output (a) sorted R on single site/disk (b) fragments/partitions sorted

F 1 F 2 F 3

CS 347 Notes 03 45

Basic sort

  • R(K,…), sort on K
  • Fragmented on K Vector: ko, k 1 , … kn

7 3

ko k 1

CS 347 Notes 03 46

  • Algorithm: each fragment sorted independently
  • If necessary, ship results

CS 347 Notes 03 47

 Same idea on different

architectures:

Shared nothing:

Shared memory: sorts F1 sorts F

P

M

P

M

Net F 1 F 2

P1 P

F 1 M F^2

CS 347 Notes 03 48

Range partitioning sort

  • R(K,….), sort on K
  • R located at one or more site/disk, not fragmented on K

9

CS 347 Notes 03 49

  • Algorithm: (a) Range partition on K (b) Basic sort Ra

Rb

R’ 1

R’ 2

R’ 3

ko

k

Local sort

Local sort

Local sort

R 1

R 2

R 3

Result

CS 347 Notes 03 50

  • Selecting a good partition vector

R a R b R c

CS 347 Notes 03 51

Example

  • Each site sends to coordinator:
    • Min sort key
    • Max sort key
    • Number of tuples
  • Coordinator computes vector and distributes to sites (also decides # of sites for local sorts)

CS 347 Notes 03 52

  • Sample scenario: Coordinator receives: SA : Min=5 Max=10 # = 10 tuples SB: Min=7 Max=17 # = 10 tuples

CS 347 Notes 03 53

  • Sample scenario:

Coordinator receives: SA : Min=5 Max=10 # = 10 tuples SB: Min=7 Max=17 # = 10 tuples

Expected tuples:

ko?

[assuming we want to sort at 2 sites] CS 347 Notes 03 54

Expected tuples:

ko?

[assuming we want to sort at 2 sites]

11

CS 347 Notes 03 61

  • Parallel/distributed Join

Input: Relations R, S May or may not be partitioned Output: R S Result at one or more sites

CS 347 Notes 03 62

Partitioned Join (Equi-join)

Ra (^) S 1

S

S 3

Rb

R 1

R

R 3

Sa

Sb

Sc

Local join

Result

f(A) f(A)

CS 347 Notes 03 63

Notes:

  • Same partition function f is used for both R and S (applied to join attribute)
  • f can be range or hash partitioning
  • Local join can be of any type (use any CS245 optimization)
  • Various scheduling options e.g., (a) partition R; partition S; join (b) partition R; build local hash table for R; partition S and join

CS 347 Notes 03 64

More notes:

  • We already know why part-join works:

R1 R2 R3 S1 S2 S3 R1 S1 R2 S2 R3 S

  • Useful to give this type of join a name, because we may want to partition data to make partition-join possible (especially in parallel DB system)

 

 

CS 347 Notes 03 65

Even more notes:

  • Selecting good partition function f very important: - Number of fragments - Hash function - Partition vector

CS 347 Notes 03 66

  • Good partition vector
    • Goal: | Ri |+| Si | the same
    • Can use coordinator to select

12

CS 347 Notes 03 67

Asymmetric fragment + replicate join

Ra (^) S

S

S

Rb

R 1

R 2

R 3

Sa

Sb

Local join

Result

f partition union

CS 347 Notes 03 68

Notes:

  • Can use any partition function f for R (even round robin)
  • Can do any join — not just equi-join e.g.: R S R.A < S.B

CS 347 Notes 03 69

General fragment and replicate join

f partition n copies of each fragment -> 3 fragments

Ra

Rb

R

R

R

R

R

R

CS 347 Notes 03 70

 S is partitioned in similar fashion

Result

All nxm pairings of

R,S fragments

R1 S

R2 S

R3 S

R1 S

R2 S

R3 S

CS 347 Notes 03 71

Notes:

  • Asymmetric F+R join is special case of general F+R
  • Asymmetric F+R may be good if S small
  • Works for non-equi-joins

CS 347 Notes 03 72

  • Semi-join
    • Goal: reduce communication traffic
    • R S  (R S) S or

R (S R) or

(R S) (S R)

A

A

A

A

A

A

A (^) A

14

CS 347 Notes 03 79

  • Similar comparisons for other semi-joins
  • Remember: only taking into account transmission cost

CS 347 Notes 03 80

  • Trick: Encode A S (or A R ) as a bit vector

key in S

<----one bit/possible key------->

CS 347 Notes 03 81

 Three way joins with semi-joins

Goal: R S T

CS 347 Notes 03 82

 Three way joins with semi-joins

Goal: R S T

Option 1: R’ S’ T where R’ = R S; S’ = S T

CS 347 Notes 03 83

 Three way joins with semi-joins

Goal: R S T

Option 1: R’ S’ T where R’ = R S; S’ = S T

Option 2: R’’ S’ T where R’’ = R S’; S’ = S T

CS 347 Notes 03 84

 Many options!  Number of semi-join options is exponential in # of relations in join

15

CS 347 Notes 03 85

Privacy Preserving Join

  • Site 1 has R(A,B)
  • Site 2 has S(A,C)
  • Want to compute R S
  • Site 1 should NOT discover any S info not in the join
  • Site 2 should NOT discover any R info not in the join

R S

site 1 (^) site 2

CS 347 Notes 03 86

Semi-Join Does Not Work

  • If Site 1 sends A R to Site 2, site 2 leans all keys of R!

A R = (a1, a2, a3, a4)

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

CS 347 Notes 03 87

Fix: Send hashed keys

  • Site 1 hashes each value of A before sending
  • Site 2 hashes (same function) its own A values to see what tuples match

A R = (h(a1), h(a2), h(a3), h(a4))

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

Site 2 sees it has h(a1),h(a3)

(a1, c1), (a3, c3) CS 347 Notes 03 88

What is problem?

A R = (h(a1), h(a2), h(a3), h(a4))

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

Site 2 sees it has h(a1),h(a3)

(a1, c1), (a3, c3)

CS 347 Notes 03 89

What is problem?

  • Dictionary attack! Site 2 takes all keys, a1, a2, a3... and checks if h(a1), h(a2), h(a3) matches what Site 1 sent...

A R = (h(a1), h(a2), h(a3), h(a4))

site 1

R A B a1 b a2 b a3 b a4 b

site 2

S A C a1 c a3 c a5 c a7 c

Site 2 sees it has h(a1),h(a3)

(a1, c1), (a3, c3)

CS 347 Notes 03 90

Adversary Model

  • Honest but Curious
    • dictionary attack is possible (cheating is internal and can’t be caught)
    • sending incorrect keys not possible (cheater could be caught)

17

CS 347 Notes 03 97

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

  • sum (sal) group by dept

dept sal

1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30

dept sal

3 sales 15 4 sales 5 7 sales 10

R a

R b

CS 347 Notes 03 98

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

  • sum (sal) group by dept

dept sal

1 toy 10 2 toy 20 5 toy 20 6 mgmt 15 8 mgmt 30

dept sal

3 sales 15 4 sales 5 7 sales 10

dept sum toy 50 mgmt 45

dept sum sales 30

sum

sum

R a

R b

CS 347 Notes 03 99

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

  • sum (sal) group by dept

R a

R b

less data!

CS 347 Notes 03 100

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

  • sum (sal) group by dept

R a

R b

dept sum toy 30 toy 20 mgmt 45

dept sum sales 15 sales 15

sum

sum

less data!

CS 347 Notes 03 101

Example:

dept sal

1 toy 10 2 toy 20 3 sales 15

dept sal

4 sales 5 5 toy 20 6 mgmt 15 7 sales 10 8 mgmt 30

  • sum (sal) group by dept

dept sum toy 50 mgmt 45

dept sum sales 30

sum

sum

R a

R b

dept sum toy 30 toy 20 mgmt 45

dept sum sales 15 sales 15

sum

sum

less

data! Preview: Map Reduce

CS 347 Notes 03 102

data A

data A

data A

data B

data B

data C

data C

18

CS 347 Notes 03 103

 Enhancements for aggregates

  • Perform aggregate during partition to reduce data transmitted
  • Does not work for all aggregate functions… Which ones?

CS 347 Notes 03 104

 Selection

  • Range or hash partition
  • Straightforward

 But what about indexes?

CS 347 Notes 03 105

 Indexing

  • Can think of partition vector as root of distributed index:

ko k 1

Localindexes

Site 1 Site 2 Site 3

CS 347 Notes 03 106

  • Index on non-partition attribute

Index sites

Tuple sites

ko k 1

CS 347 Notes 03 107

Notes:

  • If index is not too big, it may be better to keep whole and make copies...
  • If updates are frequent, can partition update work... (Question: how do we handle split of B-Tree pages?)

CS 347 Notes 03 108

  • Extensible or linear hashing R f R R R4 <- add

20

CS 347 Notes 03 115

Summary

As we consider query plans for optimization, we must consider various tricks:

  • for individual operations
  • for scheduling operations