































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of global snapshots in distributed systems, as introduced by chandy and lamport. Global snapshots are used to capture the instantaneous state of each process and communication channel in a distributed system, ensuring stable properties. The concept of consistent cuts, which are used to ensure that messages in transit are captured, and presents an algorithm for taking distributed global snapshots. The document also discusses the importance of message transmission being fifo and the challenges of taking global snapshots in non-fifo systems.
Typology: Study notes
1 / 39
This page cannot be seen from the preview
Don't miss anything!
































-includes slides/examples from Indy Gupta (UIUC), Coulouris(book) and Kshemkalyani&Singhal (book slides)
Why Global State?
○ Checkpointing: can restart distributed application on failure ○ Garbage collection of objects: objects at servers that don’t have any other objects (at any servers) with pointers to them ○ Deadlock detection: Useful in database transaction systems ○ Termination of computation: Useful in batch computing systems like Folding@Home, SETI@Home
● The notions of global time and global state are closely related. ● But, merely synchronizing clocks and taking local snapshots is not enough ● Need to account for messages in transit
● A process can (withoutfreezing the whole computation) compute the best possible approximation of a global state [Chandy & Lamport 85]
● A global state that could have occurred
● No process in the system can decide whether the state did really occur ● Guarantee stable properties (i.e. once they become true, they remain true)
Instant of local observation Time
ideal (vertical) cut
consistent cut
inconsistent cut
5
5
5 3
2
8
1
4
3 4
0
7
initial value
not attainable equivalent to a vertical cut (rubber band transformation)
can’t be made vertical (message from the future)
● For a consistent cut consisting of cut events c (^) i ,…,c (^) n , no pair of cut events is causally related. i.e ∀c (^) i ,c (^) j ~(c (^) i < c (^) j ) ∧ ~(c (^) j < c (^) i )
● For any time diagram with a consistent cut consisting of cut events c (^) i ,…,c (^) n, there is an equivalent time diagram where c (^) i ,…,c (^) n occur simultaneously. i.e. where the cut line forms a straight vertical line
simultaneously
Distributed Global Snapshot:
Requirements
should not require application to stop sending messages
○ Process state: Application-defined state or, in the worst case: ○ its heap, registers, program counter, code, etc. (essentially the coredump)
○ Assume just one snapshot run for now
System Model for Global
Snapshots
Chandy-Lamport Distributed
Snapshot Algorithm
● After a site has recorded its snapshot, it sends a marker, along all of its outgoing channels before sending out any more messages. ● The marker separates the messages in the channel into those to be included in the snapshot from those not to be recorded in the snapshot.
Chandy-Lamport Distributed
Snapshot Algorithm
19
From: Indranil Gupta (CS425 - Distributed Systems course, UIUC)