Participating Institutions:

Oak Ridge National Laboratory
North Carolina State University

Transient Data Recovery in HPC Centers











Research Publications.

  1. C. Wang, Z. Zhang, X. Ma, S. S. Vazhkudai, F. Mueller, "Improving the Availability of Supercomputer Job Input Data Using Temporal Replication", Proceedings of Int'l Supercomputing Conference (ISC-09) Hamburg, Germany, June 2009. pdf
  2. C. Wang, Z. Zhang, S. S. Vazhkudai, X, Ma, F. Mueller, "On-the-fly Recovery of Job Input Data in Supercomputers", Proceedings of 37th Int'l Conference on Parallel Processing (ICPP-08) Portland, Oregon, September 2008. pdf
  3. Z. Zhang, C. Wang, S. Vazhkudai, X, Ma, G. Pike, F. Mueller, J.W. Cobb, "Optimizing Center Performance through Coordinated Data Staging, Scheduling and Recovery", Proceedings of Supercomputing 2007 (SC07): Int'l Conference on High Performance Computing, Networking, Storage and Analysis, Reno, Nevada, November 2007. pdf slides
  4. S. Vazhkudai, X. Ma, "Recovering Transient Data: Automated On-demand Data Reconstruction and Offloading on Supercomputers", ACM SIGOPS Operating Systems Review: Special Issue on File and Storage Systems, Vol. 41, No. 1, pp. 14-18, January 2007. pdf
  5. S. Vazhkudai, X. Ma, M. Vilayannur, "Data Availability for Service Availability: Automated On-demand Data Reconstruction and Offloading on Supercomputers", ORNL Tech Report 003174, Septemer 2006.


  1. Sudharshan Vazhkudai, "IO Virtualization: Robust Storage Management in the Machine-Room and Beyond", Virtualization in HPC, Nashville, TN, September 2006. Talk


Job Opportunities.