Efficient and Flexible Fault Tolerance and Migration of Scientific Simulations Using CUMULVS
Motivation
(Collaborative User Migration, User Library for Visualization and Steering)
CUMULVS Approach
Why Instrument (Non-Transparent)?
Why Does the User Need to Help?
Identifying Program State
Checkpoint Consistency (Yuk…)
And For Your Trouble...
Rollback versus Restart…
Run-Time System Architecture
Checkpointing API
Example InstrumentationCUMULVS Initialization
Example InstrumentationData Field Description
Example InstrumentationRestart from a Checkpoint
Example InstrumentationPeriodic Handling - Restart
Example InstrumentationPeriodic Handling - Rollback
Example InstrumentationFinished Checkpointing
Case Study I - Seismic SimulationFinite Difference Approximation
Case Study II - Air Flow Over WingComputational Fluid Dynamics (CFD)
Instrumentation Cost
Checkpointing Overhead
Summary
Email: kohl@msr.epm.ornl.gov
CUMULVS Home Page: http://www.epm.ornl.gov/cs/cumulvs.html