| 7:30 - 8:15AM : | Breakfast |
| 8:15 - 8:30AM : | Welcome |
| 8:30 - 9:00AM : |
Resilience: Sacrificing Previous Convictions About Physical Laws
John T. Daly, Los Alamos National Laboratory |
| 9:00 - 9:03AM : |
Failure in Supercomputers and Supercomputer Storage
Garth Gibson, Carnegie Mellon University / Panasas, Inc. |
| 9:30 - 10:00AM : |
System-level Checkpoint/Restart with BLCR
Paul Hargrove, Lawrence Berkeley National Laboratory |
| 10:00 - 10:30AM : | Coffee Break |
| 10:30 - 11:00AM : |
Process-Level Fault Tolerance for Job Healing in HPC Environments
Stephen L. Scott, Oak Ridge National Laboratory |
| 11:00 - 11:30AM : |
A coordinated infrastructure for Fault Tolerant Systems (CIFTS)
Rinku Gupta, Argonne National Laboratory |
| 11:30 - 12:00AM : |
Towards Support for Fault Tolerance in the MPI Standard
Greg Koenig, Oak Ridge National Laboratory |
| 12:00 - 1:30AM : | Lunch Break |
| 1:30 - 2:00PM : |
Studying Systems as Artifacts
Adam J. Oliner, Stanford University |
| 2:00 - 2:30PM : |
Combining System Characterization
and Novel Execution Models to Achieve Scalable Robust Computing
Jim Brandt, Sandia National Laboratory |
| 2:30 - 3:00PM : |
Root Cause Analysis
Jon Stearley, Sandia National Laboratory |
| 3:00 - 3:30PM : | Coffee Break |
| 3:30 - 4:00PM : |
Accurate Prediction of Soft Error Vulnerability of Scientific Applications
Greg Bronevetsky, Lawrence Livermore National Laboratory |
| 4:00 - 4:30PM : |
Modular Redundancy in HPC Systems: Why, Where, When and How?
Christian Engelmann, Oak Ridge National Laboratory |
| 4:30 - 5:00PM : |
Making Resilience a Reality Through a Resilience Consortium
James Elliott, Louisiana Tech University |
| 5:00 - 5:30PM : | Discussion & Closing |