CS 347: Distributed Transactions - Failure, Node, Network, Scenarios, and Process Models, Slides of Distributed Database Management Systems

Notes on distributed transactions, covering failure models, node models (fail-stop and byzantine), network models (reliable and partitionable), scenarios, and process models (cohorts and transaction servers).

Typology: Slides

2011/2012

Uploaded on 07/16/2012

sambandam
sambandam 🇮🇳

4.3

(37)

154 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2
CS 347 Notes06 7
Failure model:
Desired
Events Expected
Undesired Unexpected
CS 347 Notes06 8
Node models
(1) Fail-stop nodes time
perfect halted recovery perfect
Volatile
memory lost Stable
storage ok
CS 347 Notes06 9
(2) Byzantine nodes
APerfect
Perfect Arbitrary failure Recovery
B
C
At any given time, at most some fraction f of nodes
failed (typically f < 1/2 or f < 1/3)
Node models
CS 347 Notes06 10
Network models
(1) Reliable network
- in order messages
- no spontaneous messages
-timeout T
D
I.e., no lost messages, except for node failures
If no ack in TD sec. Destination down
(not paused)
CS 347 Notes06 11
Variation of reliable net:
Persistent messages
If destination down, net will eventually
deliver message
Simplifies node recovery, but leads to
inefficiencies (hides too much)
Not considered here
CS 347 Notes06 12
Network models
(2) Partitionable network
- In order messages
- No spontaneous messages
- no timeout; nodes can have different view of failures
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download CS 347: Distributed Transactions - Failure, Node, Network, Scenarios, and Process Models and more Slides Distributed Database Management Systems in PDF only on Docsity!

CS 347 Notes06 7

Failure model:

Desired Events Expected Undesired Unexpected

CS 347 Notes06 8

Node models

(1) Fail-stop nodes time perfect halted recovery perfect

Volatile memory lost Stable storage ok

CS 347 Notes06 9

(2) Byzantine nodes A Perfect Perfect Arbitrary failure Recovery B

C

At any given time, at most some fraction f of nodes failed (typically f < 1/2 or f < 1/3)

Node models

CS 347 Notes06 10

Network models

(1) Reliable network

  • in order messages
  • no spontaneous messages
  • timeout T D

I.e., no lost messages, except for node failures

If no ack in TD sec. Destination down(not paused)

CS 347 Notes06 11

Variation of reliable net:

  • Persistent messages
    • If destination down, net will eventually deliver message
    • Simplifies node recovery, but leads to inefficiencies (hides too much)
    • Not considered here

CS 347 Notes06 12

Network models

(2) Partitionable network

  • In order messages
  • No spontaneous messages
  • no timeout; nodes can have different view of failures

CS 347 Notes06 13

Scenarios

  • Reliable network
    • Fail-stop nodes
      • No data replication (1)
      • Data replication (2)
  • Partitionable network
    • Fail-stop nodes (3)

CS 347 Notes06 14

No Data Replication

  • Reliable network, fail-stop nodes
  • Basic idea: node P controls X

net P^ Item X

CS 347 Notes06 15

No Data Replication

  • Reliable network, fail-stop nodes
  • Basic idea: node P controls X

net P^ Item X

  • Single control point simplifies concurrency control, recovery
  • Not an availability hit: if P down, X unavailable too! CS 347 Notes06 16

“P controls X” means

  • P does concurrency control for X
  • P does recovery for X

CS 347 Notes06 17

Say transaction T wants to access X:

PT is process that represents T at this node

PT

req

Local DMBS

Lock mgr LOG^ X

CS 347 Notes06 18

Process models

(A) Cohorts Spawn process Communication USER Data Access

T 1

Local DMBS

T 2

Local DMBS

T 3

Local DMBS

CS 347 Notes06 25

-> Example: after participant fails: Log: T 1 X undo/redo info

T 1

Y ... (^) info ...

T 1

“W” state

CS 347 Notes06 26

 At recovery:

  • T 1 is in “W” state
  • Obtain X,Y write locks (no read locks!)
  • Wait for message from coordinator (or ask coordinator for outcome)

CS 347 Notes06 27

 Other examples:

  • No “W” record on log  abort T 1
  • See “C” record on log  finish T 1

CS 347 Notes06 28

  • Add timeouts to cope with messages lost during crashes
  • Add finish (“F”) state for coordinator – all done, can forget outcome

Next

CS 347 Notes06 29

Coordinator I

W

C

A

F

go exec*

c-ok*

nok*

nok abort*

ok* commit*

t=timeout cping=coord. ping CS 347 Notes06 30

Coordinator I

W

C

A

F

go exec*

c-ok*

nok*

ping

  • (^) cping_t_

ping abort

ping commit t cping

t abort*

nok abort*

ok* commit*

t=timeout cping=coord. ping

CS 347 Notes06 31

Participant

I

W

C

A

exec ok

exec nok

commit c-ok

abort nok

CS 347 Notes06 32

Participant

I

W

C

A

exec ok

exec nok

commit c-ok

abort cping- , _tping nok

cping done

cping done “done” message counts as either c-ok or n-ok for coordinator

CS 347 Notes06 33

Participant

I

W

C

A

exec ok

equivalent to finish state

exec nok

commit c-ok

abort cping- , _tping nok

cping done

cping done “done” message counts as either c-ok or n-ok for coordinator CS 347 Notes06 34

Presumed abort protocol

  • “F” and “A” states combined in coordinator
  • Saves persistent space (forget quicker)
  • Presumed commit is analogous

CS 347 Notes06 35

Presumed abort-coordinator (participant unchanged)

I

W

C

A/F

go exec* ping

  • c-ok* -

ping abort

ping commit (^) t cping

nok, t abort*

ok* commit*

CS 347 Notes06 36

Remember:

all state transitions must be logged

Example: tracking who has sent “OK” msgs Log at coord:

  • After failure, we know still waiting for OK from node b
  • Alternative: do not log receipts of “OK”s abort T 1

T 1

start part={a,b}

T 1

OK froma RCV

CS 347 Notes06 43

Coordinator Participant I

W

P

A

go exec*

ack* commit*

nok abort* ok* pre *

3PC

C

I

W

P

A

exec ok

commit

exec nok

pre ack

C

abort

CS 347 Notes06 44

3PC recovery rules: termination protocol

  • Survivors try to complete transaction, based on their current states
  • Goal:
    • If dead nodes committed or aborted, then survivors should not contradict!
    • Else, survivors can do as they please...

survivors

CS 347 Notes06 45

  • Let {S 1 ,S 2 ,…Sn} be survivor sites
  • If one or more Si = COMMIT  COMMIT T
  • If one or more Si = ABORT  ABORT T
  • If one or more Si = PREPARE  T could not have aborted  COMMIT T
  • If no Si = PREPARE (or COMMIT)  T could not have committed  ABORT T

survivors CS 347 Notes06 46

Example:

P

W

W

CS 347 Notes06 47

Example:

I

W

W

CS 347 Notes06 48

Example:

P

P

C

CS 347 Notes06 49

Example:

P

W

A

CS 347 Notes06 50

 Once survivors make decision, they must

select new coordinator to continue 3PC

P P C C W P C C W P P C Decide to commit Time 1

Time 2

Time 3

Time 4

CS 347 Notes06 51

Note: when survivors continue 3PC, failed nodes do not count E.g., “OK*”  when OK’s received from W non-failed nodes

P

ok* pre*

CS 347 Notes06 52

Note: 3PC unsafe with partitions!

W W W

P

P

abort (^) commit

CS 347 Notes06 53

Node recovery:

  • After node N recovers from failure:
    • do not participate in termination protocol (why?) W

W

W

P

 A

CS 347 Notes06 54

Node recovery:

  • After node N recovers from failure:
    • do not participate in termination protocol (why?) W

W

W

P

 A

later on...

CS 347 Notes06 61

Example(1): Coord P 2  W P 1 P 3  W p 4  W

  • Nodes P2, P3, P 4 enter “W” state and fail
  • When they recover, coord. and P 1 are down
  • Each node has 1 vote, V=5, Maj=

CS 347 Notes06 62

Example(1): Coord P 2  W P 1 P 3  W p 4  W

  • Nodes P2, P3, P 4 enter “W” state and fail
  • When they recover, coord. and P 1 are down
  • Each node has 1 vote, V=5, Maj=
  • Since P2, P3, P 4 have majority, they know coord. could not have gone to “P” without at least one of their votes
  • Therefore, T can be aborted!

CS 347 Notes06 63

Example(2): Coord P 3  ”P” P 1 P 4  ”W” P 2

  • Each node has 1 vote; V=5, Maj=
  • Nodes fail after entering states shown; P 3 , P 4 recover

CS 347 Notes06 64

Example(2): Coord P 3  ”P” P 1 P 4  ”W” P 2

  • Each node has 1 vote; V=5, Maj=
  • Nodes fail after entering states shown; P 3 , P 4 recover
  • Termination rule says we can try to commit, but P 3 , P 4 do not have enough votes, so they do nothing!
  • P 3 , P 4 doing nothing is good because later on, coord. P1, P 2 could abort T

CS 347 Notes06 65

Summary: Majority rule ensures that any decision (e.g., Preparing, committing) will be known to any future group making a decision

decision # 2

decision

1

CS 347 Notes06 66

Important Detail for Majority 3PC

  • Example:

W

W

W

P

 A

CS 347 Notes06 67

Important Detail for Majority 3PC

  • Example:

W

W

W

P

 A

 P  C

CS 347 Notes06 68

Need “Prepare To Abort” State

I

W

PC PA

go exec*

ackC* commit*

nok ok* preA* preC *

C

I

W

PC

A

exec ok

commit

exec nok

preC ackC

C

preA ackA

coordinator participant

A

ackA* abort*

PA

abort

CS 347 Notes06 69

Example Revisited

W

W

W

PC

 PA

CS 347 Notes06 70

Example Revisited

W

W

W

PC

 PA

 PC  C

OK to commit since transaction could not have aborted

CS 347 Notes06 71

Example Revisited -II

W

W

W

PC

 PA

 PA

CS 347 Notes06 72

Example Revisited -II

W

W

W

PC

 PA

 PA

No decision: Transaction could have aborted or could have committed... Block!

 PA

CS 347 Notes06 79

  • Need to “combine” WFGs to discover global deadlock T T

T T

T T

e.g., central detection node

CS 347 Notes06 80

Problem: False deadlocks

T T

T T

T T

T T

Time 1

Time 2

Time 3

info sent

info sent

T T

at centralsite:

CS 347 Notes06 81

  • Many deadlock solutions
    • Distributed vs. centralized
    • Detection vs. prevention
      • timeouts
      • wait-die
      • wound-wait
  • Covered in CS