Chandy-Lamport Distributed System: Global Snapshots and Consistent Cuts, Study notes of Distributed Programming and Computing

The concept of global snapshots in distributed systems, as introduced by chandy and lamport. Global snapshots are used to capture the instantaneous state of each process and communication channel in a distributed system, ensuring stable properties. The concept of consistent cuts, which are used to ensure that messages in transit are captured, and presents an algorithm for taking distributed global snapshots. The document also discusses the importance of message transmission being fifo and the challenges of taking global snapshots in non-fifo systems.

Typology: Study notes

2022/2023

Uploaded on 01/05/2024

oluremi-afolabi
oluremi-afolabi 🇳🇬

1 document

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Computing Concepts -
Global State in Distributed Systems
Prof. Nalini Venkatasubramanian
230 Distributed Systems - Week 3
-includes slides/examples from
Indy Gupta (UIUC), Coulouris(book) and
Kshemkalyani&Singhal (book slides)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Chandy-Lamport Distributed System: Global Snapshots and Consistent Cuts and more Study notes Distributed Programming and Computing in PDF only on Docsity!

Distributed Computing Concepts -

Global State in Distributed Systems

Prof. Nalini Venkatasubramanian

230 Distributed Systems - Week 3

-includes slides/examples from Indy Gupta (UIUC), Coulouris(book) and Kshemkalyani&Singhal (book slides)

Why Global State?

  • Distributed applications/services execute concurrently on multiple machines.
  • A Snapshot of the distributed application, i.e. a global picture is useful

Checkpointing: can restart distributed application on failure Garbage collection of objects: objects at servers that don’t have any other objects (at any servers) with pointers to them Deadlock detection: Useful in database transaction systems Termination of computation: Useful in batch computing systems like Folding@Home, SETI@Home

Simulate A Global State

● The notions of global time and global state are closely related. ● But, merely synchronizing clocks and taking local snapshots is not enough ● Need to account for messages in transit

● A process can (withoutfreezing the whole computation) compute the best possible approximation of a global state [Chandy & Lamport 85]

● A global state that could have occurred

● No process in the system can decide whether the state did really occur ● Guarantee stable properties (i.e. once they become true, they remain true)

Time

  • P
  • P
  • P - e
    • e
      • e
        • e
          • e23 e24 e Event Diagram - e12 e
            • e32 e33 e
  • P
  • P
  • P - e Time - e
    • e - e22 e23 e24 e - e12 e - e32 e33 e
    • e11 e21 e Poset Diagram
      • e
      • e
        • e
          • e
  • e
  • e - e - e - e
  • P Rubber Band Transformation
  • P
  • P - e Time
    • e - e - e
  • P - e41 e - e

P

P

P

P

P

Instant of local observation Time

ideal (vertical) cut

consistent cut

inconsistent cut

5

5

5 3

2

8

Cuts (Summary)

1

4

3 4

0

7

initial value

not attainable equivalent to a vertical cut (rubber band transformation)

can’t be made vertical (message from the future)

Consistent Cuts

● Some Theorems

● For a consistent cut consisting of cut events c (^) i ,…,c (^) n , no pair of cut events is causally related. i.e ∀c (^) i ,c (^) j ~(c (^) i < c (^) j ) ∧ ~(c (^) j < c (^) i )

● For any time diagram with a consistent cut consisting of cut events c (^) i ,…,c (^) n, there is an equivalent time diagram where c (^) i ,…,c (^) n occur simultaneously. i.e. where the cut line forms a straight vertical line

● All cut events of a consistent cut can occur

simultaneously

Distributed Global Snapshot:

Requirements

  • Snapshot should not interfere with normal application actions, and it

should not require application to stop sending messages

  • Each process is able to record its own state

○ Process state: Application-defined state or, in the worst case: ○ its heap, registers, program counter, code, etc. (essentially the coredump)

  • Global state is collected in a distributed manner
  • Any process may initiate the snapshot

○ Assume just one snapshot run for now

System Model for Global

Snapshots

● The system consists of a collection of n processes p1,

p2, ..., pn that are connected by channels.

● There are no globally shared memory and physical global

clock and processes communicate by passing messages

through communication channels.

● C ij denotes the channel from process pi to process pj and

its state is denoted by SC ij.

● The actions performed by a process are modeled as

three types of events:

● Internal events,the message send event and the message
receive event.
● For a message mij that is sent by process pi to process pj , let
send(m ij ) and rec(m ij ) denote its send and receive events.

Chandy-Lamport Distributed

Snapshot Algorithm

● Assumes FIFO communication in channels
● Uses a control message, called a marker to separate messages in
the channels.

● After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending out any more messages. ● The marker separates the messages in the channel into those to be included in the snapshot from those not to be recorded in the snapshot.

● A process must record its snapshot no later than when it receives a
marker on any of its incoming channels.
● The algorithm terminates after each process has received a marker
on all of its incoming channels.
● All the local snapshots get disseminated to all other processes and
all the processes can determine the global state.

Chandy-Lamport Distributed

Snapshot Algorithm

Marker receiving rule for Process Pi
If ( Pi has not yet recorded its state ) it
records its process state now
records the state of c as the empty set
turns on recording of messages arriving over other channels
else
Pi records the state of c as the set of messages received over c
since it saved its state
Marker sending rule for Process Pi
After Pi has recorded its state , for each outgoing channel c:
Pi sends one marker message over c
(before it sends any other message over c)

P

Time

P

P

A B C D E

E F G

H I J

Message

Instruction or Step

Snapshot Example

19

From: Indranil Gupta (CS425 - Distributed Systems course, UIUC)

P1 is Initiator:
  • Record local state S1,
  • Send out markers
  • Turn on recording on channels C 21 , C 31

P

Time

P

P

A B C D E

E F G

H I J