Workshop on
Resiliency in High Performance Computing (RESILIENCE 2008)
http://xcr.cenit.latech.edu/resilience2008/
HAPCW
Co-Chairs
Stephen
L. Scott, Oak Ridge National Lab
and Chokchai Box
Leangsuksun, Louisiana Tech
University
Publication
Chair
Hong Ong, Oak Ridge National Laboratory
Title |
Authors |
|
|
||
Welcome |
Stephen Scott and Box Leangsuksun |
|
Christian
Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin He |
||
Performance
and Availability Tradeoffs in Replicated File Systems |
Jiaying Zhang and Peter Honeyman |
|
A
Technique for Lock-less Mirroring in Parallel File Systems |
Bradley W. Settlemyer and Walter B.
Ligon III |
|
14:00-16:00 |
||
J.T.
Daly, L.A. Pritchett-Sheats, and S.E. Michalak |
||
William M. Jones, John T. Daly, and Nathan A. DeBardeleben |
||
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids |
Gopi
Kandaswamy, Anirban Mandal, and Daniel A. Reed |
|
Thomas Ropars
and Christine Morin |
||
|
|
|
Reliability-aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments |
Nichamon
Naksinehaboon, Yudan Liu, Chokchai (Box) Leangsuksun, Raja Nassar,
Mihaela Paun, and Stephen L. Scott |
|
Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems |
Ann Gentile,
Jim Brandt, Philippe Pebay, David Thompson, Matthew Wong, Bert
Debusschere, and Jackson Mayo |
|
Jon Stearley |
||
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids |
Gopi
Kandaswamy, Anirban Mandal, and Daniel A. Reed |
|