Distributed DB Design-Distributed and Parallel Data Management-Lecture Slides, Slides of Distributed Database Management Systems

Prof. Jitesh Bhaskar delivered this lecture for Distributed and Parallel Data Management course at Dhirubhai Ambani Institute of Information and Communication Technology. Its main point are: Distributed, Database, Design, Fragmentation, Allocation, Queries, Round, Hash, Range, Partitioning

Typology: Slides

2011/2012

Uploaded on 07/16/2012

sambandam
sambandam 🇮🇳

4.3

(37)

154 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
4/9/2012
2
CS 347 Notes 02 7
# NM Loc Sal
E5
7
8
Sa 10
Sally Sb 25
Tom Sa 15
Joe
# NM Loc Sal # NM Loc Sal
5
8Sa 10
Tom Sa 15
Joe 7Sb 25Sally
..
..
..
..
F
At Sa At Sb
CS 347 Notes 02 8
F= { F1, F2 }
F1= loc=Sa E F2= loc=Sb E
CS 347 Notes 02 9
F= { F1, F2 }
F1= loc=Sa E F2= loc=Sb E
called primary horizontal fragmentation
CS 347 Notes 02 10
Fragmentation
Horizontal Primary
depends on local attributes
R Derived
depends on foreign relation
•Vertical
R
CS 347 Notes 02 11
Fragmentation
Horizontal Primary
depends on local attributes
R Derived
depends on foreign relation
•Vertical
RFragmentation also called
Sharding
CS 347 Notes 02 12
Three common horizontal
partitioning techniques
Round robin
Hash partitioning
Range partitioning
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Distributed DB Design-Distributed and Parallel Data Management-Lecture Slides and more Slides Distributed Database Management Systems in PDF only on Docsity!

CS 347 Notes 02 7

NM Loc Sal

E (^5) 7 8

Sa 10 Sally Sb 25 Tom Sa 15

Joe

NM Loc Sal (^) # NM Loc Sal

5 8

Sa 10 Tom Sa 15

Joe 7 Sally Sb 25

F

At Sa At Sb CS 347 Notes 02 8

F = { F 1 , F 2 }

F 1 =  loc=Sa E F 2 =  loc=Sb E

CS 347 Notes 02 9

F = { F 1 , F 2 }

F 1 =  loc=Sa E F 2 =  loc=Sb E

 called primary horizontal fragmentation

CS 347 Notes 02 10

Fragmentation

  • Horizontal Primary depends on local attributes R Derived depends on foreign relation
  • Vertical R

CS 347 Notes 02 11

Fragmentation

  • Horizontal Primary depends on local attributes R Derived depends on foreign relation
  • Vertical R Fragmentation also called Sharding

CS 347 Notes 02 12

Three common horizontal

partitioning techniques

  • Round robin
  • Hash partitioning
  • Range partitioning

CS 347 Notes 02 13

  • Round robin

R D 0 D 1 D 2 t1 t t2 t t3 t t4 t ... t

  • Evenly distributes data
  • Good for scanning full relation
  • Not good for point or range queries CS 347 Notes 02 14
    • Hash partitioning

R D 0 D 1 D 2 t1h(k 1 )=2 t t2h(k 2 )=0 t t3h(k 3 )=0 t t4h(k 4 )=1 t ...

  • Good for point queries on key; also for joins
  • Not good for range queries; point queries not on key
  • If hash function good, even distribution

CS 347 Notes 02 15

  • Range partitioning

R D 0 D 1 D 2 t1: A=5 t t2: A=8 t t3: A=2 t t4: A=3 t ...

  • Good for some range queries on A
  • Need to select good vector: else unbalance  data skew  execution skew

partitioning vector

V 0 V 1

CS 347 Notes 02 16

Which are good fragmentations?

Example:

F = { F 1 , F 2 }

F 1 =  sal<10 E F 2 =  sal>20 E

CS 347 Notes 02 17

Which are good fragmentations?

Example:

F = { F 1 , F 2 }

F 1 =  sal<10 E F 2 =  sal>20 E

 Problem: Some tuples lost!

CS 347 Notes 02 18

Which are good fragmentations?

Second example:

F = { F 3 , F 4 }

F 3 =  sal<10 E F 4 =  sal>5 E

CS 347 Notes 02 25

Example of generation

  • Say queries use predicates: A<10, A>5, Loc = SA , Loc = SB
  • Next: - generate “minterm” predicates
    • eliminate useless ones

CS 347 Notes 02 26

Minterm predicates (part I)

(1) A<10  A>5  Loc=SA  Loc=SB (2) A<10  A>5  Loc=SA  ¬(Loc=SB ) (3) A<10  A>5  ¬(Loc=SA )  Loc=SB (4) A<10  A>5  ¬(Loc=SA )  ¬(Loc=SB ) (5) A<10  ¬(A>5)  Loc=SA  Loc=SB (6) A<10  ¬(A>5)  Loc=SA  ¬(Loc=SB ) (7) A<10  ¬(A>5)  ¬(Loc=SA )  Loc=SB (8) A<10  ¬(A>5)  ¬(Loc=SA )  ¬(Loc=SB )

CS 347 Notes 02 27

Minterm predicates (part I)

(1) A<10  A>5  Loc=SA  Loc=SB (2) A<10  A>5  Loc=SA  ¬(Loc=SB ) (3) A<10  A>5  ¬(Loc=SA )  Loc=SB (4) A<10  A>5  ¬(Loc=SA )  ¬(Loc=SB ) (5) A<10  ¬(A>5)  Loc=SA  Loc=SB (6) A<10  ¬(A>5)  Loc=SA  ¬(Loc=SB ) (7) A<10  ¬(A>5)  ¬(Loc=SA )  Loc=SB (8) A<10  ¬(A>5)  ¬(Loc=SA )  ¬(Loc=SB )

CS 347 Notes 02 28

Minterm predicates (part I)

(1) A<10  A>5  Loc=SA  Loc=SB (2) A<10  A>5  Loc=SA  ¬(Loc=SB ) (3) A<10  A>5  ¬(Loc=SA )  Loc=SB (4) A<10  A>5  ¬(Loc=SA )  ¬(Loc=SB ) (5) A<10  ¬(A>5)  Loc=SA  Loc=SB (6) A<10  ¬(A>5)  Loc=SA  ¬(Loc=SB ) (7) A<10  ¬(A>5)  ¬(Loc=SA )  Loc=SB (8) A<10  ¬(A>5)  ¬(Loc=SA )  ¬(Loc=SB )

A  5

5 < A < 10

CS 347 Notes 02 29

Minterm predicates (part II)

(9) ¬(A<10)  A>5  Loc=SA  Loc=SB (10) ¬(A<10)  A>5  Loc=SA ¬(Loc=SB ) (11) ¬(A<10)  A>5 ¬(Loc=SA )  Loc=SB (12) ¬(A<10)  A>5 ¬(Loc=SA ) ¬(Loc=SB ) (13) ¬(A<10) ¬(A>5)  Loc=SA  Loc=SB (14) ¬(A<10) ¬(A>5)  Loc=SA ¬(Loc=SB ) (15) ¬(A<10) ¬(A>5) ¬(Loc=SA )  Loc=SB (16) ¬(A<10) ¬(A>5) ¬(Loc=SA ) ¬(Loc=SB )

CS 347 Notes 02 30

Minterm predicates (part II)

(9) ¬(A<10)  A>5  Loc=SA  Loc=SB (10) ¬(A<10)  A>5  Loc=SA ¬(Loc=SB ) (11) ¬(A<10)  A>5 ¬(Loc=SA )  Loc=SB (12) ¬(A<10)  A>5 ¬(Loc=SA ) ¬(Loc=SB ) (13) ¬(A<10) ¬(A>5)  Loc=SA  Loc=SB (14) ¬(A<10) ¬(A>5)  Loc=SA ¬(Loc=SB ) (15) ¬(A<10) ¬(A>5) ¬(Loc=SA )  Loc=SB (16) ¬(A<10) ¬(A>5) ¬(Loc=SA ) ¬(Loc=SB ) A  10

CS 347 Notes 02 31

Final fragments:

F 2: 5 < A < 10  Loc=SA F 3: 5 < A < 10  Loc=SB F 6: A  5  Loc=SA F 7: A  5  Loc=SB F 10: A  10  Loc=SA F 11: A  10  Loc=SB

CS 347 Notes 02 32

Note: elimination of useless fragments

depends on application semantics:

e.g.: if LOC could be  SA ,  SB , we need to add fragments F 4: 5

CS 347 Notes 02 43

E 1

(at Sa) (^) (at Sb )

E 2

NM Loc Sal

5 Joe Sa 10 8 Tom Sa 15 …

NM Loc Sal

7 Sally Sb 25 12 Fred Sb 15 …

Description

5 work on 347 hw 7 go to moon 5 build table 12 rest …

J

CS 347 Notes 02 44

E 1

(at Sa) (^) (at Sb )

E 2

NM Loc Sal

5 Joe Sa 10 8 Tom Sa 15 …

NM Loc Sal

7 Sally Sb 25 12 Fred Sb 15 …

J 1 J^2

J 1 = J E 1 J 2 = J E 2

Des

5 work on 347 hw 5 build table …

Des

7 go to moon 12 rest …

CS 347 Notes 02 45

Derived horizontal fragmentation

R, F = { F 1 , F 2 , ... F n}

S, D = {D 1 , D 2 , …Dn} where Di =S F i

Convention: R is owner S is member

F could be primary or derived

CS 347 Notes 02 46

  • Checking completeness and

disjointness of derived fragmentation

 But no #= 33 in E 1 nor in E 2!

Des

… 33 build chair …

Example: Say J is:

This J tuple will not be in J 1 nor J 2 Fragmentation not complete

CS 347 Notes 02 47

Need to enforce referential integrity constraint: join attr(#) of member relation  joint attr(#) of owner relation

To get completeness

CS 347 Notes 02 48

NM Loc Sal

5 Joe Sa 10 …

NM Loc Sal

5 Fred Sb 20 …

Example:

E 1 E^2

Description

5 day off …

Description

5 day off …

Description

5 day off …

J 1

J

J 2

Fragmentation is not disjoint!

CS 347 Notes 02 49

Join attribute(#) should be key of owner relation

To get disjointness

CS 347 Notes 02 50

Summary: horizontal fragmentation

  • Type: primary, derived
  • Properties: completeness, disjointness

CS 347 Notes 02 51

Vertical fragmentation

E 1

NM Loc Sal

5 Joe Sa 10 7 Sally Sb 25 8 Fred Sa 15 …

NM Loc

5 Joe Sa 7 Sally Sb 8 Fred Sa …

Sal

5 10 7 25 8 15 …

E

E 2

Example:

CS 347 Notes 02 52

R[T]  R 1 [T 1 ] T i  T

R n [Tn]

 Just like normalization of relations

CS 347 Notes 02 53

Properties: R[T]  R i [Ti ]

(1) Completeness

U Ti = T all i

CS 347 Notes 02 54

(2) Disjointness Ti  Tj =  for all i,j ij

E(#,LOC,SAL)

E 1 (#,LOC)

E 2 (SAL)

CS 347 Notes 02 61

Allocation

Example: E(#,NM,LOC,SAL)  F 1 = loc=Sa E ; F 2 = loc=Sb E Qa: select … where loc=Sa... Qb: select … where loc=Sb…

Site a Site b

Where do F 1 ,F 2 go?

? CS 347 Notes 02 62

Issues

  • Where do queries originate
  • What is communication cost? and size of answers, relations,…
  • What is storage capacity, cost at sites? and size of fragments?
  • What is processing power at sites?

CS 347 Notes 02 63

  • What is query processing strategy?
    • How are joins done?
    • Where are answers collected?

More Issues

CS 347 Notes 02 64

  • Cost of updating copies?
  • Writes and concurrency control?
  • ...

Do we replicate fragments?

CS 347 Notes 02 65

Optimization problem:

  • What is best placement of fragments and/or best number of copies to: - minimize query response time - maximize throughput - minimize “some cost” - ...
  • Subject to constraints?
    • Available storage
    • Available bandwidth, power,…
    • Keep 90% of response time below X
    • ... CS 347 Notes 02 66

Optimization problem:

  • What is best placement of fragments and/or best number of copies to: - minimize query response time - maximize throughput - minimize “some cost” - ...
  • Subject to constraints?
    • Available storage
    • Available bandwidth, power,…
    • Keep 90% of response time below X
    • ...

This is an incredibly hard problem

CS 347 Notes 02 67

Example: Single fragment F

Read cost:  [t i  MIN Cij ]

i: Originating site of request t i: Read traffic at Si Cij : Retrieval cost Accessing fragment F at Sj from Si

i=1 j

m

CS 347 Notes 02 68

Scenario - Read cost

.

..

.

.

.

i C=inf

c i,

c i,1 (^) c i,

Stream of read requests for F ti REQ/SEC C=inf

C=inf F

F F

CS 347 Notes 02 69

Write cost

  Xj ui C’ ij

i: Originating site of request j: Site being updated Xj : 0 if F not stored at Sj 1 if F stored at Sj ui: Write traffic at Si C’ ij : Write cost Updating F at Sj from Si

i=1 j=

m m

CS 347 Notes 02 70

Scenario - write cost

Updates ui updates/sec

.. .. .. i

F

F F

CS 347 Notes 02 71

Storage cost:

 Xi d i

Xi: 0 if F not stored at Si 1 if F stored at Si d i: storage cost at Si

i=

m

CS 347 Notes 02 72

Target function:

min  [t i MIN Cij +  Xj  ui  C’ ij ]

  •  Xi  d i

i=1 j j=

i=

m m

m

Another Example Rule

  • Rule 2:
  • IF TABLE_NAME = "Users" AND FIELD STR('home location') = 'France‘ THEN SET 'MIN_COPIES' = 3 AND SET 'EXCL LIST' = 'USWest, USEast‘ CONSTRAINT PRI = 1

CS 347 Notes 02 79 CS 347 Notes 02 80

Summary

  • Description of fragmentation
  • Good fragmentations
  • Design of fragmentation
  • Allocation