Home | Projects | Publications [BibTeX] | Opportunities | Resume

Whitepapers ( Abstract, Publication, Citation

    2009

  1. Nathan DeBardeleben, James Laros, John T. Daly, Stephen L. Scott, Christian Engelmann, and Bill Harrod. High-End Computing Resilience: Analysis of Issues Facing the HEC Community and Path-Forward for Research and Development. Whitepaper, 2009.

Journal Publications ( Abstract, Publication, Citation, DOI)

    2010

  1. Stephen L. Scott, Geoffroy R. Vallée, Thomas Naughton, Anand Tikotekar, Christian Engelmann, and Hong H. Ong. System-Level Virtualization Research at Oak Ridge National Laboratory. Future Generation Computer Systems (FGCS), volume 26, number 3, pages 304-307, 2010. Elsevier B.V, Amsterdam, The Netherlands. ISSN 0167-739X.
  2. 2009

  3. Xubin He, Li Ou, Christian Engelmann, Xin Chen, and Stephen L. Scott. Symmetric Active/Active Metadata Service for High Availability Parallel File Systems. Journal of Parallel and Distributed Computing (JPDC), volume 69, number 12, pages 961-973, 2009. Elsevier B.V, Amsterdam, The Netherlands. ISSN 0743-7315.
  4. 2007

  5. Xubin (Ben) He, Li Ou, Martha J. Kosa, Stephen L. Scott, and Christian Engelmann. A Unified Multiple-Level Cache for High Performance Cluster Storage Systems. International Journal of High Performance Computing and Networking (IJHPCN), volume 5, number 1-2, pages 97-109, 2007. Inderscience Publishers, Geneve, Switzerland. ISSN 1740-0562.
  6. 2006

  7. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Symmetric Active/Active High Availability for High-Performance Computing System Services. Journal of Computers (JCP), volume 1, number 8, pages 43-54, 2006. Academy Publisher, Oulu, Finland. ISSN 1796-203X.
  8. Christian Engelmann, Stephen L. Scott, David E. Bernholdt, Narasimha R. Gottumukkala, Chokchai (Box) Leangsuksun, Jyothish Varma, Chao Wang, Frank Mueller, Aniruddha G. Shet, and Ponnuswamy (Saday) Sadayappan. MOLAR: Adaptive Runtime Support for High-End Computing Operating and Runtime Systems. ACM SIGOPS Operating Systems Review (OSR), volume 40, number 2, pages 63-72, 2006. ACM Press, New York, NY, USA. ISSN 0163-5980.

Conference Publications ( Abstract, Publication, Presentation, Citation, DOI)

    2011

  1. Christian Engelmann and Swen Böhm. Redundant Execution of HPC Applications with MR-MPI. In Proceedings of the 10th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2011, pages 31-38, Innsbruck, Austria, February 15-17, 2011. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-864-9.
  2. 2010

  3. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Hybrid Checkpointing for MPI Jobs in HPC Environments. In Proceedings of the 16th IEEE International Conference on Parallel and Distributed Systems (ICPADS) 2010, pages 524-533, Shanghai, China, December 8-10, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4307-9.
  4. Min Li, Sudharshan S. Vazhkudai, Ali R. Butt, Fei Meng, Xiaosong Ma, Youngjae Kim, Christian Engelmann, and Galen Shipman. Functional Partitioning to Optimize End-to-End Performance on Many-Core Architectures. In Proceedings of the 23rd IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2010, pages 1-12, New Orleans, LA, USA, November 13-19, 2010. ACM Press, New York, NY, USA. ISBN 978-1-4244-7559-9.
  5. Christian Engelmann and Frank Lauer. Facilitating Co-Design for Extreme-Scale Systems Through Lightweight Simulation. In Proceedings of the 12th IEEE International Conference on Cluster Computing (Cluster) 2010: 1st Workshop on Application/Architecture Co-design for Extreme-scale Computing (AACEC), pages 1-8, Hersonissos, Crete, Greece, September 20-24, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-4244-8395-2.
  6. Swen Böhm, Christian Engelmann, and Stephen L. Scott. Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments. In Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications (HPCC) 2010, pages 72-78, Melbourne, Australia, September 1-3, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4214-0.
  7. Antonina Litvinova, Christian Engelmann, and Stephen L. Scott. A Proactive Fault Tolerance Framework for High-Performance Computing. In Proceedings of the 9th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2010, Innsbruck, Austria, February 16-18, 2010. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-783-3.
  8. 2009

  9. George Ostrouchov, Thomas Naughton, Christian Engelmann, Geoffroy R. Vallée, and Stephen L. Scott. Nonparametric Multivariate Anomaly Analysis in Support of HPC Resilience. In Proceedings of the 5th IEEE International Conference on e-Science (e-Science) 2009: Workshop on Computational Science, pages 80-85, Oxford, UK, December 9-11, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-4244-5946-9.
  10. Thomas Naughton, Wesley Bland, Geoffroy R. Vallée, Christian Engelmann, and Stephen L. Scott. Fault Injection Framework for System Resilience Evaluation - Fake Faults for Finding Future Failures. In Proceedings of the 18th International Symposium on High Performance Distributed Computing (HPDC) 2009: 2nd Workshop on Resiliency in High Performance Computing (Resilience) 2009, pages 23-28, Munich, Germany, June 9, 2009. ACM Press, New York, NY, USA. ISBN 978-1-60558-587-1.
  11. Anand Tikotekar, Hong H. Ong, Sadaf Alam, Geoffroy R. Vallée, Thomas Naughton, Christian Engelmann, and Stephen L. Scott. Performance Comparison of Two Virtual Machine Scenarios Using an HPC Application - A Case study Using Molecular Dynamics Simulations. In Proceedings of the 3rd Workshop on System-level Virtualization for High Performance Computing (HPCVirt) 2009, in conjunction with the 4th ACM SIGOPS European Conference on Computer Systems (EuroSys) 2009, pages 33-40, Nuremberg, Germany, March 30, 2009. ACM Press, New York, NY, USA. ISBN 978-1-60558-465-2.
  12. Narate Taerat, Nichamon Naksinehaboon, Clayton Chandler, James Elliott, Chokchai (Box) Leangsuksun, George Ostrouchov, Stephen L. Scott, and Christian Engelmann. Blue Gene/L Log Analysis and Time to Interrupt Estimation. In Proceedings of the 4th International Conference on Availability, Reliability and Security (ARES) 2009, pages 173-180, Fukuoka, Japan, March 16-19, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-4244-3572-2.
  13. Christian Engelmann, Hong H. Ong, and Stephen L. Scott. Evaluating the Shared Root File System Approach for Diskless High-Performance Computing Systems. In Proceedings of the 10th LCI International Conference on High-Performance Clustered Computing (LCI) 2009, Boulder, CO, USA, March 9-12, 2009.
  14. Christian Engelmann, Geoffroy R. Vallée, Thomas Naughton, and Stephen L. Scott. Proactive Fault Tolerance Using Preemptive Migration. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2009, pages 252-257, Weimar, Germany, February 18-20, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3544-9. ISSN 1066-6192.
  15. Alessandro Valentini, Christian Di Biagio, Fabrizio Batino, Guido Pennella, Fabrizio Palma, and Christian Engelmann. High Performance Computing with Harness over InfiniBand. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2009, pages 151-154, Weimar, Germany, February 18-20, 2009. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3544-9. ISSN 1066-6192.
  16. Christian Engelmann, Hong H. Ong, and Stephen L. Scott. The Case for Modular Redundancy in Large-Scale High Performance Computing Systems. In Proceedings of the 8th IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN) 2009, pages 189-194, Innsbruck, Austria, February 16-18, 2009. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-784-0.
  17. 2008

  18. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Proactive Process-Level Live Migration in HPC Environments. In Proceedings of the 21st IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC) 2008, pages 1-12, Austin, TX, USA, November 15-21, 2008. ACM Press, New York, NY, USA. ISBN 978-1-4244-2835-9.
  19. Geoffroy R. Vallée, Thomas Naughton, Hong H. Ong, Anand Tikotekar, Christian Engelmann, Wesley Bland, Ferrol Aderholt, and Stephen L. Scott. Virtual System Environments. In Communications in Computer and Information Science: Proceedings of the 2nd DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and New Technologies (SVM) 2008, pages 72-83, Munich, Germany, October 21-22, 2008. Springer Verlag, Berlin, Germany. ISBN 978-3-540-88707-2. ISSN 1865-0929.
  20. Anand Tikotekar, Geoffroy Vallée, Thomas Naughton, Hong H. Ong, Christian Engelmann, and Stephen L. Scott. An Analysis of HPC Benchmark Applications in Virtual Machine Environments. In Lecture Notes in Computer Science: Proceedings of the 14th European Conference on Parallel and Distributed Computing (Euro-Par) 2008: 3rd Workshop on Virtualization in High-Performance Cluster and Grid Computing (VHPC) 2008, pages 63-71, Las Palmas de Gran Canaria, Spain, August 26-29, 2008. Springer Verlag, Berlin, Germany. ISBN 978-3-642-00954-9.
  21. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations. In Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid) 2008: Workshop on Resiliency in High Performance Computing (Resilience) 2008, pages 813-818, Lyon, France, May 19-22, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3156-4.
  22. Xin Chen, Benjamin Eckart, Xubin (Ben) He, Christian Engelmann, and Stephen L. Scott. An Online Controller Towards Self-Adaptive File System Availability and Performance. In Proceedings of the 5th High Availability and Performance Workshop (HAPCW) 2008, in conjunction with the 1st High-Performance Computer Science Week (HPCSW) 2008, Denver, CO, USA, April 3-4, 2008.
  23. Anand Tikotekar, Geoffroy Vallée, Thomas Naughton, Hong H. Ong, Christian Engelmann, Stephen L. Scott, and Anthony M. Filippi. Effects of Virtualization on a Scientific Application - Running a Hyperspectral Radiative Transfer Code on Virtual Machines. In Proceedings of the 2nd Workshop on System-level Virtualization for High Performance Computing (HPCVirt) 2008, in conjunction with the 3rd ACM SIGOPS European Conference on Computer Systems (EuroSys) 2008, pages 16-23, Glasgow, UK, March 31, 2008. ACM Press, New York, NY, USA. ISBN 978-1-60558-120-0.
  24. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Symmetric Active/Active Replication for Dependent Services. In Proceedings of the 3rd International Conference on Availability, Reliability and Security (ARES) 2008, pages 260-267, Barcelona, Spain, March 4-7, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3102-1.
  25. Geoffroy R. Vallée, Kulathep Charoenpornwattana, Christian Engelmann, Anand Tikotekar, Chokchai (Box) Leangsuksun, Thomas Naughton, and Stephen L. Scott. A Framework For Proactive Fault Tolerance. In Proceedings of the 3rd International Conference on Availability, Reliability and Security (ARES) 2008, pages 659-664, Barcelona, Spain, March 4-7, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3102-1.
  26. Björn Könning, Christian Engelmann, Stephen L. Scott, and George A. (Al) Geist. Virtualized Environments for the Harness High Performance Computing Workbench. In Proceedings of the 16th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2008, pages 133-140, Toulouse, France, February 13-15, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3089-5.
  27. Geoffroy R. Vallée, Thomas Naughton, Christian Engelmann, Hong H. Ong, and Stephen L. Scott. System-level Virtualization for High Performance Computing. In Proceedings of the 16th Euromicro International Conference on Parallel, Distributed, and network-based Processing (PDP) 2008, pages 636-643, Toulouse, France, February 13-15, 2008. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-3089-5.
  28. 2007

  29. Li Ou, Christian Engelmann, Xubin (Ben) He, Xin Chen, and Stephen L. Scott. Symmetric Active/Active Metadata Service for Highly Available Cluster Storage Systems. In Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS) 2007, Cambridge, MA, USA, November 19-21, 2007. ACTA Press, Calgary, AB, Canada. ISBN 978-0-88986-703-1.
  30. Emanuele Di Saverio, Marco Cesati, Christian Di Biagio, Guido Pennella, and Christian Engelmann. Distributed Real-Time Computing with Harness. In Lecture Notes in Computer Science: Proceedings of the 14th European PVM/MPI Users` Group Meeting (EuroPVM/MPI) 2007, pages 281-288, Paris, France, September 30 - October 3, 2007. Springer Verlag, Berlin, Germany. ISBN 978-3-540-75415-2. ISSN 0302-9743.
  31. Li Ou, Xubin (Ben) He, Christian Engelmann, and Stephen L. Scott. A Fast Delivery Protocol for Total Order Broadcasting. In Proceedings of the 16th IEEE International Conference on Computer Communications and Networks (ICCCN) 2007, pages 730-734, Honolulu, HI, USA, August 13-16, 2007. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-42441-251-8. ISSN 1095-2055.
  32. Arun B. Nagarajan, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Proactive Fault Tolerance for HPC with Xen Virtualization. In Proceedings of the 21st ACM International Conference on Supercomputing (ICS) 2007, pages 23-32, Seattle, WA, USA, June 16-20, 2007. ACM Press, New York, NY, USA. ISBN 978-1-59593-768-1.
  33. Christian Engelmann, Hong H. Ong, and Stephen L. Scott. Middleware in Modern High Performance Computing System Architectures. In Lecture Notes in Computer Science: Proceedings of the 7th International Conference on Computational Science (ICCS) 2007, Part II: 4th Special Session on Collaborative and Cooperative Environments (CCE) 2007, pages 784-791, Beijing, China, May 27-30, 2007. Springer Verlag, Berlin, Germany. ISBN 3-5407-2585-5. ISSN 0302-9743.
  34. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Transparent Symmetric Active/Active Replication for Service-Level High Availability. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid (CCGrid) 2007: 7th International Workshop on Global and Peer-to-Peer Computing (GP2PC) 2007, pages 755-760, Rio de Janeiro, Brazil, May 14-17, 2007. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7695-2833-3.
  35. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. On Programming Models for Service-Level High Availability. In Proceedings of the 2nd International Conference on Availability, Reliability and Security (ARES) 2007, pages 999-1006, Vienna, Austria, April 10-13, 2007. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7695-2775-2.
  36. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance. In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2007, pages 1-10, Long Beach, CA, USA, March 26-30, 2007. ACM Press, New York, NY, USA. ISBN 978-1-59593-768-1.
  37. Christian Engelmann, Stephen L. Scott, Hong H. Ong, Geoffroy R. Vallée, and Thomas Naughton. Configurable Virtualized System Environments for High Performance Computing. In Proceedings of the 1st Workshop on System-level Virtualization for High Performance Computing (HPCVirt) 2007, in conjunction with the 2nd ACM SIGOPS European Conference on Computer Systems (EuroSys) 2007, Lisbon, Portugal, March 20, 2007.
  38. 2006

  39. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Towards High Availability for High-Performance Computing System Services: Accomplishments and Limitations. In Proceedings of the 4th High Availability and Performance Workshop (HAPCW) 2006, in conjunction with the 7th Los Alamos Computer Science Institute (LACSI) Symposium 2006, Santa Fe, NM, USA, October 17, 2006.
  40. Li Ou, Xin Chen, Xubin (Ben) He, Christian Engelmann, and Stephen L. Scott. Achieving Computational I/O Effciency in a High Performance Cluster Using Multicore Processors. In Proceedings of the 4th High Availability and Performance Workshop (HAPCW) 2006, in conjunction with the 7th Los Alamos Computer Science Institute (LACSI) Symposium 2006, Santa Fe, NM, USA, October 17, 2006.
  41. Kai Uhlemann, Christian Engelmann, and Stephen L. Scott. JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management. In Proceedings of the 8th IEEE International Conference on Cluster Computing (Cluster) 2006, pages 1-10, Barcelona, Spain, September 25-28, 2006. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 1-4244-0328-6. ISSN 1552-5244.
  42. Ronald Baumann, Christian Engelmann, and George A. (Al) Geist. A Parallel Plug-in Programming Paradigm. In Lecture Notes in Computer Science: Proceedings of the 7th International Conference on High Performance Computing and Communications (HPCC) 2006, pages 823-832, Munich, Germany, September 13-15, 2006. Springer Verlag, Berlin, Germany. ISBN 978-3-540-39368-9. ISSN 0302-9743.
  43. Jyothish Varma, Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Scalable, Fault-Tolerant Membership for MPI Tasks on HPC Systems. In Proceedings of the 20th ACM International Conference on Supercomputing (ICS) 2006, pages 219-228, Cairns, Australia, June 28-30, 2006. ACM Press, New York, NY, USA. ISBN 1-59593-282-8.
  44. Daniel I. Okunbor, Christian Engelmann, and Stephen L. Scott. Exploring Process Groups for Reliability, Availability and Serviceability of Terascale Computing Systems. In Proceedings of the 2nd International Conference on Computer Science and Information Systems 2006, Athens, Greece, June 19-21, 2006.
  45. Christian Engelmann and George A. (Al) Geist. RMIX: A Dynamic, Heterogeneous, Reconfigurable Communication Framework. In Lecture Notes in Computer Science: Proceedings of the 6th International Conference on Computational Science (ICCS) 2006, Part II: 3rd Special Session on Collaborative and Cooperative Environments (CCE) 2006, pages 573-580, Reading, UK, May 28-31, 2006. Springer Verlag, Berlin, Germany. ISBN 3-540-34381-4. ISSN 0302-9743.
  46. Christian Engelmann, Stephen L. Scott, Chokchai (Box) Leangsuksun, and Xubin (Ben) He. Active/Active Replication for Highly Available HPC System Services. In Proceedings of the 1st International Conference on Availability, Reliability and Security (ARES) 2006: 1st International Workshop on Frontiers in Availability, Reliability and Security (FARES) 2006, pages 639-645, Vienna, Austria, April 20-22, 2006. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7695-2567-9.
  47. 2005

  48. Christian Engelmann and Stephen L. Scott. Concepts for High Availability in Scientific High-End Computing. In Proceedings of the 3rd High Availability and Performance Workshop (HAPCW) 2005, in conjunction with the 6th Los Alamos Computer Science Institute (LACSI) Symposium 2005, Santa Fe, NM, USA, October 11, 2005.
  49. Kshitij Limaye, Chokchai (Box) Leangsuksun, Zeno Greenwood, Stephen L. Scott, Christian Engelmann, Richard M. Libby, and Kasidit Chanchio. Job-Site Level Fault Tolerance for Cluster and Grid Environments. In Proceedings of the 7th IEEE International Conference on Cluster Computing (Cluster) 2005, pages 1-9, Boston, MA, USA, September 26-30, 2005. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7803-9486-0. ISSN 1552-5244.
  50. Hertong Song, Chokchai (Box) Leangsuksun, Raja Nassar, Yudan Liu, Christian Engelmann, and Stephen L. Scott. UML-based Beowulf Cluster Availability Modeling. In International Conference on Software Engineering Research and Practice (SERP) 2005, pages 161-167, Las Vegas, NV, USA, June 27-30, 2005. CSREA Press. ISBN 1-932415-49-1.
  51. Christian Engelmann and Stephen L. Scott. High Availability for Ultra-Scale High-End Scientific Computing. In Proceedings of the 2nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2) 2005, in conjunction with the 19th ACM International Conference on Supercomputing (ICS) 2005, Cambridge, MA, USA, June 19, 2005.
  52. Chokchai (Box) Leangsuksun, Venkata K. Munganuru, Tong Liu, Stephen L. Scott, and Christian Engelmann. Asymmetric Active-Active High Availability for High-end Computing. In Proceedings of the 2nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2) 2005, in conjunction with the 19th ACM International Conference on Supercomputing (ICS) 2005, Cambridge, MA, USA, June 19, 2005.
  53. Christian Engelmann and George A. (Al) Geist. Super-Scalable Algorithms for Computing on 100,000 Processors. In Lecture Notes in Computer Science: Proceedings of the 5th International Conference on Computational Science (ICCS) 2005, Part I, pages 313-320, Atlanta, GA, USA, May 22-25, 2005. Springer Verlag, Berlin, Germany. ISBN 978-3-540-26032-5. ISSN 0302-9743.
  54. Christian Engelmann and George A. (Al) Geist. A Lightweight Kernel for the Harness Metacomputing Framework. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2005: 14th Heterogeneous Computing Workshop (HCW) 2005, Denver, CO, USA, April 4, 2005. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7695-2312-9. ISSN 1530-2075.
  55. 2004

  56. Christian Engelmann, Stephen L. Scott, and George A. (Al) Geist. High Availability through Distributed Control. In Proceedings of the 2nd High Availability and Performance Workshop (HAPCW) 2004, in conjunction with the 5th Los Alamos Computer Science Institute (LACSI) Symposium 2004, Santa Fe, NM, USA, October 12, 2004.
  57. Xubin (Ben) He, Li Ou, Stephen L. Scott, and Christian Engelmann. A Highly Available Cluster Storage System using Scavenging. In Proceedings of the 2nd High Availability and Performance Workshop (HAPCW) 2004, in conjunction with the 5th Los Alamos Computer Science Institute (LACSI) Symposium 2004, Santa Fe, NM, USA, October 12, 2004.
  58. 2003

  59. Christian Engelmann and George A. (Al) Geist. A Diskless Checkpointing Algorithm for Super-scale Architectures Applied to the Fast Fourier Transform. In Proceedings of the Challenges of Large Applications in Distributed Environments Workshop (CLADE) 2003, in conjunction with the 12th IEEE International Symposium on High Performance Distributed Computing (HPDC) 2003, pages 47, Seattle, WA, USA, June 21, 2003. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 0-7695-1984-9.
  60. 2002

  61. Christian Engelmann, Stephen L. Scott, and George A. (Al) Geist. Distributed Peer-to-Peer Control in Harness. In Lecture Notes in Computer Science: Proceedings of the 2nd International Conference on Computational Science (ICCS) 2002, Part II: Workshop on Global and Collaborative Computing, pages 720-727, Amsterdam, The Netherlands, April 21-24, 2002. Springer Verlag, Berlin, Germany. ISBN 3-540-43593-X. ISSN 0302-9743.

Conference Poster Presentations ( Abstract, Poster, Citation)

    2009

  1. Stephen L. Scott, Christian Engelmann, Geoffroy R. Vallée, Thomas Naughton, Anand Tikotekar, George Ostrouchov, Chokchai (Box) Leangsuksun, Nichamon Naksinehaboon, Raja Nassar, Mihaela Paun, Frank Mueller, Chao Wang, Arun B. Nagarajan, and Jyothish Varma. A Tunable Holistic Resiliency Approach for High-Performance Computing Systems. Poster at the National HPC Workshop on Resilience 2009, Arlington, VA, USA, August 12-14, 2009.
  2. Stephen L. Scott, Geoffroy R. Vallée, Thomas Naughton, Anand Tikotekar, Christian Engelmann, and Hong H. Ong. System-level Virtualization for for High-Performance Computing. Poster at the National HPC Workshop on Resilience 2009, Arlington, VA, USA, August 12-14, 2009.
  3. Stephen L. Scott, Christian Engelmann, Geoffroy R. Vallée, Thomas Naughton, Anand Tikotekar, George Ostrouchov, Chokchai (Box) Leangsuksun, Nichamon Naksinehaboon, Raja Nassar, Mihaela Paun, Frank Mueller, Chao Wang, Arun B. Nagarajan, and Jyothish Varma. A Tunable Holistic Resiliency Approach for High-Performance Computing Systems. Poster at the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) 2009, Raleigh, NC, USA, February 14-18, 2009.
  4. 2008

  5. George A. (Al) Geist, Christian Engelmann, Jack J. Dongarra, George Bosilca, Magdalena M. Sławińska, and Jarosław K. Sławiński. The Harness Workbench: Unified and Adaptive Access to Diverse High-Performance Computing Platforms. Poster at the 1st High-Performance Computer Science Week (HPCSW) 2008, Denver, CO, USA, March 30 - April 5, 2008.
  6. Stephen L. Scott, Christian Engelmann, Hong H. Ong, Geoffroy R. Vallée, Thomas Naughton, Anand Tikotekar, George Ostrouchov, Chokchai (Box) Leangsuksun, Nichamon Naksinehaboon, Raja Nassar, Mihaela Paun, Frank Mueller, Chao Wang, Arun B. Nagarajan, Jyothish Varma, Xubin (Ben) He, Li Ou, and Xin Chen. Resiliency for High-Performance Computing Systems. Poster at the 1st High-Performance Computer Science Week (HPCSW) 2008, Denver, CO, USA, March 30 - April 5, 2008.
  7. Stephen L. Scott, Geoffroy R. Vallée, Thomas Naughton, Anand Tikotekar, Christian Engelmann, and Hong H. Ong. System-level Virtualization for for High-Performance Computing. Poster at the 1st High-Performance Computer Science Week (HPCSW) 2008, Denver, CO, USA, March 30 - April 5, 2008.

Technical Reports ( Abstract, Report, Citation)

    2010

  1. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Hybrid Full/Incremental Checkpoint/Restart for MPI Jobs in HPC Environments. Technical Report , ORNL/TM-2010/162, Oak Ridge National Laboratory, Oak Ridge, TN, USA, August, 2010.
  2. Chao Wang, Frank Mueller, Christian Engelmann, and Stephen L. Scott. Proactive Process-Level Live Migration and Back Migration in HPC Environments. Technical Report , ORNL/TM-2010/161, Oak Ridge National Laboratory, Oak Ridge, TN, USA, August, 2010.

Invited Talks and Lectures ( Abstract, Presentation, Citation)

    2010

  1. Christian Engelmann. Scalable HPC System Monitoring. Invited talk at the 3rd HPC Resiliency Summit: Workshop on Resiliency for Petascale HPC 2010, in conjunction with the 3rd Los Alamos Computer Science Symposium (LACSS) 2010, Santa Fe, NM, USA, October 13, 2010.
  2. Christian Engelmann. Beyond Application-Level Checkpoint/Restart - Advanced Software Approaches for Fault Resilience. Talk at the 39th SPEEDUP Workshop on High Performance Computing, Zurich, Switzerland, September 6, 2010.
  3. Christian Engelmann and Stephen L. Scott. Reliability, Availability, and Serviceability (RAS) for Petascale High-End Computing and Beyond. Talk at the Forum to Address Scalable Technology for Runtime and Operating Systems (FAST-OS) Workshop, in conjunction with the USENIX Federated Conferences Week (USENIX) 2010, Boston MA, USA, June 22, 2010.
  4. Christian Engelmann. Resilience Challenges at the Exascale. Talk at the 14th Workshop on Distributed Supercomputing (SOS) 2010, Savannah, GA, USA, March 8-11, 2010.
  5. Christian Engelmann and Stephen L. Scott. HPC System Software Research at Oak Ridge National Laboratory. Seminar at the Leibniz Rechenzentrum (LRZ), Garching, Germany, February 22, 2010.
  6. 2009

  7. Christian Engelmann. High-Performance Computing Research Internship and Appointment Opportunities at Oak Ridge National Laboratory. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, December 14, 2009.
  8. Christian Engelmann. JCAS - IAA Simulation Efforts at Oak Ridge National Laboratory. Invited talk at the IAA Workshop on HPC Architectural Simulation (HPCAS), Boulder, CO, USA, September 1-2, 2009.
  9. Christian Engelmann. Modeling Techniques Towards Resilience. Invited talk at the National HPC Workshop on Resilience 2009, Arlington, VA, USA, August 12-14, 2009.
  10. Christian Engelmann. System Resilience Research at ORNL in the Context of HPC. Invited talk at the Institut National de Recherche en Informatique et en Automatique (INRIA), Rennes, France, May 15, 2009.
  11. Christian Engelmann. High-Performance Computing Research and MSc Internship Opportunities at Oak Ridge National Laboratory. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, May 11, 2009.
  12. Christian Engelmann. Modular Redundancy for Soft-Error Resilience in Large-Scale HPC Systems. Invited talk at the Dagstuhl Seminar on Fault Tolerance in High-Performance Computing and Grids, Schloss Dagstuhl, Wadern, Germany, May 3-8, 2009.
  13. Christian Engelmann. Proactive Fault Tolerance Using Preemptive Migration. Invited talk at the 3rd Collaborative and Grid Computing Technologies Workshop (CGCTW) 2009, Cancun, Mexico, April 22-24, 2009.
  14. Christian Engelmann. Resiliency. Panel at the 13th Workshop on Distributed Supercomputing (SOS) 2009, Hilton Head, SC, USA, March 9-12, 2009.
  15. 2008

  16. Christian Engelmann. High-Performance Computing Research at Oak Ridge National Laboratory. Invited talk at the Reading Annual Computational Science Workshop, Reading, United Kingdom, December 8, 2008.
  17. Christian Engelmann. Modular Redundancy in HPC Systems: Why, Where, When and How?. Invited talk at the 1st HPC Resiliency Summit: Workshop on Resiliency for Petascale HPC 2008, in conjunction with the 1st Los Alamos Computer Science Symposium (LACSS) 2008, Santa Fe, NM, USA, October 15, 2008.
  18. Christian Engelmann. Resiliency for High-Performance Computing. Invited talk at the 2nd Collaborative and Grid Computing Technologies Workshop (CGCTW) 2008, Cancun, Mexico, April 10-12, 2008.
  19. Christian Engelmann. Advanced Fault Tolerance Solutions for High Performance Computing. Seminar at the Laboratoire d'Analyse et d'Architecture des Systèmes, Centre National de la Recherche Scientifique, Toulouse, France, February 11, 2008.
  20. 2007

  21. Christian Engelmann. Service-Level High Availability in Parallel and Distributed Systems. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, October 10, 2007.
  22. Christian Engelmann. Advanced Fault Tolerance Solutions for High Performance Computing. Invited talk at the Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing (WTTC) 2007, Khon Kean, Thailand, June 8, 2007.
  23. Christian Engelmann. Advanced Fault Tolerance Solutions for High Performance Computing. Invited talk at the Workshop on Trends, Technologies and Collaborative Opportunities in High Performance and Grid Computing (WTTC) 2007, Khon Kean, Thailand, June 4-5, 2007.
  24. Christian Engelmann. Operating System Research at ORNL: System-level Virtualization. Seminar at the Institute of Graphics and Parallel Processing, Johannes Kepler University, Linz, Austria, April 10, 2007.
  25. Christian Engelmann. Towards High Availability for High-Performance Computing System Services: Accomplishments and Limitations. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, March 14, 2007.
  26. 2006

  27. Christian Engelmann. High Availability for Ultra-Scale High-End Scientific Computing. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, June 9, 2006.
  28. Stephen L. Scott and Christian Engelmann. Advancing Reliability, Availability and Serviceability for High-Performance Computing. Seminar at the Institute of Graphics and Parallel Processing, Johannes Kepler University, Linz, Austria, April 19, 2006.
  29. 2005

  30. Christian Engelmann. High Availability for Ultra-Scale High-End Scientific Computing. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, October 18, 2005.
  31. Christian Engelmann. High Availability for Ultra-Scale High-End Scientific Computing. Seminar at the Department of Mathematics and Computer Science, Fayetteville State University, Fayetteville, NC, USA, September 26, 2005.
  32. Christian Engelmann. High Availability for Ultra-Scale High-End Scientific Computing. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, May 13, 2005.
  33. Christian Engelmann. High Availability for Ultra-Scale High-End Scientific Computing. Seminar at the Center for Entrepreneurship and Information Technology, Louisiana Tech University, Ruston, LA, USA, April 15, 2005.
  34. 2004

  35. Christian Engelmann. Diskless Checkpointing on Super-scale Architectures - Applied to the Fast Fourier Transform. Invited talk at the 11th SIAM Conference on Parallel Processing for Scientific Computing (SIAM PP) 2004, San Francisco, CA, USA, February 25, 2004.
  36. Christian Engelmann. Super-scalable Algorithms - Next Generation Supercomputing on 100,000 and more Processors. Seminar at the Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA, January 29, 2004.
  37. Christian Engelmann. Distributed Peer-to-Peer Control for Harness. Seminar at the Department of Computer Science, North Carolina State University, Raleigh, NC, USA, February 11, 2004.

Co-advised Theses ( Abstract, Publication, Presentation, Citation)

    2010

  1. Ian S. Jones. Simulation of Large Scale Architectures on High Performance Computers. Master's thesis, Department of Computer Science, University of Reading, UK, October 22, 2010. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory); George Bosilca (University of Tennessee, Knoxville).
  2. Swen Böhm. Development of a RAS Framework for HPC Environments: Realtime Data Reduction of Monitoring Data. Master's thesis, Department of Computer Science, University of Reading, UK, March 12, 2010. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory); George Bosilca (University of Tennessee, Knoxville).
  3. Frank Lauer. Simulation of Advanced Large-Scale HPC Architectures. Master's thesis, Department of Computer Science, University of Reading, UK, March 12, 2010. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory); George Bosilca (University of Tennessee, Knoxville).
  4. 2009

  5. Antonina Litvinova. RAS Framework Engine Prototype. Master's thesis, Department of Computer Science, University of Reading, UK, September 22, 2009. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory); George Bosilca (University of Tennessee, Knoxville).
  6. 2007

  7. Björn Könning. Virtualized Environments for the Harness Workbench. Master's thesis, Department of Computer Science, University of Reading, UK, March 14, 2007. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory).
  8. Matthias Weber. High Availability for the Lustre File System. Master's thesis, Department of Computer Science, University of Reading, UK, March 14, 2007. Thesis research performed at Oak Ridge National Laboratory. Double diploma in conjunction with the Department of Engineering I, Technical College for Engineering and Economics (FHTW) Berlin, Germany. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory).
  9. 2006

  10. Ronald Baumann. Design and Development of Prototype Components for the Harness High-Performance Computing Workbench. Master's thesis, Department of Computer Science, University of Reading, UK, March 6, 2006. Thesis research performed at Oak Ridge National Laboratory. Double diploma in conjunction with the Department of Engineering I, Technical College for Engineering and Economics (FHTW) Berlin, Germany. Advisors: Prof. Vassil N. Alexandrov (University of Reading); George A. (Al) Geist and Christian Engelmann (Oak Ridge National Laboratory).
  11. Kai Uhlemann. High Availability for High-End Scientific Computing. Master's thesis, Department of Computer Science, University of Reading, UK, March 6, 2006. Thesis research performed at Oak Ridge National Laboratory. Double diploma in conjunction with the Department of Engineering I, Technical College for Engineering and Economics (FHTW) Berlin, Germany. Advisors: Prof. Vassil N. Alexandrov (University of Reading); George A. (Al) Geist and Christian Engelmann (Oak Ridge National Laboratory).

Theses ( Abstract, Publication, Presentation, Citation)

    2008

  1. Christian Engelmann. Symmetric Active/Active High Availability for High-Performance Computing System Services. PhD thesis, Department of Computer Science, University of Reading, UK, 2008. Thesis research performed at Oak Ridge National Laboratory. Advisor: Prof. Vassil N. Alexandrov (University of Reading).
  2. 2001

  3. Christian Engelmann. Distributed Peer-to-Peer Control for Harness. Master's thesis, Department of Computer Science, University of Reading, UK, July 7, 2001. Thesis research performed at Oak Ridge National Laboratory. Double diploma in conjunction with the Department of Engineering I, Technical College for Engineering and Economics (FHTW) Berlin, Germany. Advisors: Prof. Vassil N. Alexandrov (University of Reading); George A. (Al) Geist (Oak Ridge National Laboratory).
  4. Christian Engelmann. Distributed Peer-to-Peer Control for Harness. Master's thesis, Department of Engineering I, Technical College for Engineering and Economics (FHTW) Berlin, Germany, February 23, 2001. Thesis research performed at Oak Ridge National Laboratory. Double diploma in conjunction with the Department of Computer Science, University of Reading, UK. Advisors: Prof. Uwe Metzler (Technical College for Engineering and Economics (FHTW) Berlin); George A. (Al) Geist (Oak Ridge National Laboratory).