Participating Institutions: Oak Ridge National Laboratory |
stdchk: Checkpoint Storage System for HPC Applications |
                    People                     Publications                     Positions |
Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. In these environments, a dedicated checkpoint storage system can offer multiple benefits: reduce the load on a traditional file system, offer high-performance through specialization, and, finally, optimize checkpoint data management by taking into account application semantics. Such a storage system can present a unifying abstraction to checkpoint operations, while hiding the fact that there are no dedicated resources to store the checkpoint data. Our work presents a dedicated checkpoint storage system for desktop grid environments. Our solution uses scavenged disk space from participating desktops to build an inexpensive storage space, offering a traditional file system interface for easy integration with checkpointing applications.
NEW! @ ORNL: