Checkpoint Consistency (Yuk…)
Also User’s Responsibility.
General Case:
- Very Difficult, Must Determine Global State.
- Requires Full Synchronization (Chandy/Lamport).
Pragmatic Case - Iterative Computations…
- Already Loosely Synchronized by Message Cycle.
- Failure at Any Point ? Roll Back to Last Iteration.
- Checkpoint at “Same” Place for Each Task.
- All Sent Msgs Recvd / Before Next Iter Msgs Sent…