Home | Projects | Publications | Opportunities

Christian Engelmann

Research and Development Staff Member
System Research Team, Computer Science Research Group
Computer Science and Mathematics Division, Oak Ridge National Laboratory
P.O. Box 2008, Oak Ridge, TN 37831-6173, USA

+1 (865) 574-3132 / +1 (865) 576-5491 / engelmannc@ornl.gov / www.csm.ornl.gov/~engelman

Christian Engelmann`s work deals with software research and development for next-generation extreme-scale high-performance computing (HPC) systems. As part of the System Research Team at Oak Ridge National Laboratory (ORNL) and in collaboration with other laboratories and universities, Christian`s research aims at providing high-level reliability, availability, and serviceability (RAS) for next-generation supercomputers to improve their resiliency (and ultimately efficiency) by performing research and development in novel high availability and fault tolerance system software solutions. Another area Christian Engelmann is focusing on is research and development in core system software technologies to enable ``plug-and-play'' supercomputing, which offers transparent portability of software to eliminate most of the software modifications caused by divers supercomputing platforms and supercomputing system upgrades.

Other, past research by Christian Engelmann included work on a pluggable lightweight heterogeneous Distributed Virtual Machine (DVM) environment, where clusters of personal computers, workstations, and supercomputers can be aggregated to form one giant DVM (in the spirit of its widely-used predecessor, Parallel Virtual Machine (PVM)). Further past work was part of a Cooperative Research and Development Agreement (CRADA) with IBM that focused on a new generation of scientific algorithms (super-scalable algorithms) to address the challenges in scalability and fault-tolerance for extreme-scale supercomputers, such as the IBM Blue Gene/L system.

News

Upcoming Presentations

Conference Deadlines

Select Publications

  1. A. Nagarajan, F. Mueller, C. Engelmann, and S. L. Scott. Proactive fault tolerance for HPC with Xen virtualization. In Proceedings of the 21th ACM International Conference on Supercomputing (ICS) 2007, Seattle, WA, USA, June 16-20, 2007.
  2. C. Wang, F. Mueller, C. Engelmann, and S. L. Scott. A job pause service under LAM/MPI+BLCR for transparent fault tolerance. In Proceedings of the 21st International Parallel and Distributed Processing Symposium (IPDPS) 2007, Long Beach, CA, USA, March 26-30, 2007.
  3. X. He, L. Ou, M. Kosa, S. L. Scott, and C. Engelmann. A unified cache for high performance cluster storage systems. International Journal of High Performance Computing and Networking (IJHPCN), 5(1), pages 97-109, 2007.
  4. C. Engelmann, S. L. Scott, C. Leangsuksun, and X. He. Symmetric active/active high availability for high-performance computing system services. Journal of Computers (JCP), 1(8), pages 43-54, 2006.
  5. R. Baumann, C. Engelmann, and G. A. Geist. A parallel plug-in programming paradigm. In Lecture Notes in Computer Science: Proceedings of the International Conference on High Performance Computing and Communications (HPCC) 2006, volume 4208, pages 823-832, Munich, Germany, September 13-15, 2006.
  6. J. Varma, C. Wang, F. Mueller, C. Engelmann, and S. L. Scott. Scalable, fault-tolerant membership for MPI tasks on HPC systems. In Proceedings of the 20th ACM International Conference on Supercomputing (ICS) 2006, pages 219-228, Cairns, Australia, June 28-30, 2006.
  7. C. Engelmann and G. A. Geist. RMIX: A dynamic, heterogeneous, reconfigurable communication framework. In Lecture Notes in Computer Science: Proceedings of the 6th International Conference on Computational Science (ICCS) 2006, Part II, volume 3992, pages 573-580, Reading, UK, May 28-31, 2006.
  8. C. Engelmann, S. L. Scott, D. E. Bernholdt, N. R. Gottumukkala, C. Leangsuksun, J. Varma, C. Wang, F. Mueller, A. G. Shet, and P. Sadayappan. MOLAR: Adaptive runtime support for high-end computing operating and runtime systems. ACM SIGOPS Operating Systems Review (OSR), 40(2), pages 63-72, 2006.
  9. C. Engelmann and G. A. Geist. Super-scalable algorithms for computing on 100,000 processors. In Lecture Notes in Computer Science: Proceedings of the 5th International Conference on Computational Science (ICCS) 2005, Part I, volume 3514, pages 313-320, Atlanta, GA, USA, May 22-25, 2005.

Please click on the respective icons to access:
Publications in Portable Document Format (PDF)
Presentations in Portable Document Format (PDF)
Publication References in BibTex Format (BIB)

Please contact engelmannc@ornl.gov with questions or comments regarding this page.
Copyright © 2001-2007, Christian Engelmann. All Rights Reserved.
Last Modified: Thursday, 01-May-2008 11:56:46 EDT
http://www.csm.ornl.gov/~engelman/index.html