Distributed Snapshots: Chandy-Lamport Protocol

72 points by federicoponzi 2 years ago · 4 comments

Reader

jeffreygoesto 2 years ago

I found this [0] a very accessible explanation as well.

[0] https://blog.acolyer.org/2015/04/22/distributed-snapshots-de...

scrubs 2 years ago

Unusually well written article for distributed work involving tla. Thanks. I liked it and learned something. Bookmarked.

wg0 2 years ago

Noob question - Raft and Paxos solve a different problem?

yencabulator 2 years ago

Those are about distributed consensus, making sure participants come to the same conclusion about something and nobody has the wrong answer.
Distributed snapshots are trying to do as little work as possible to get a consistent view of the distributed computation, without forcing the heavy cost of consensus on it. For example, node A is sending a message to node B, we don't care if we capture
- 1: A before it sends the message, B before it receives the message
- 2: A after it has sent the message, the message, and B before it receives the message
- 3: A after it has sent the message, B after it has received the message
No matter which of those states we restore, the computation will continue correctly.