Oak Ridge National Laboratory performs basic research in core technologies for future generations of high-end computing architectures, including experimental computing systems. Using measurement, modeling, and simulation, we investigate these technologies with the goal of improving the performance, efficiency, reliability, and usability of these architectures for our sponsors. Often, we develop new algorithms and software systems to effectively exploit each technology.
Since 2004 we have worked in a wide range of areas, including:
- Emerging architectures including GPUs, FPGAs, nonvolatile memory, and other alternative architectures
- Productive programming environments including compilers, GAS programming models, and scalable runtime systems
- Performance analysis, modeling, simulation, and prediction
- Application-Architecture codesign
- Early evaluation and benchmarking of High Performance Computing systems
- Visualization of extreme scale data
- Parallel I/O
eXMATex (Contact ???)
Abstract Scalable Performance Engineering Notation (ASPEN)
Aspen (Abstract Scalable Performance Engineering Notation) is a domain specific language for performance modeling that fills an important gap in existing techniques for performance prediction. It is designed to enable rapid exploration of new algorithms and architectures. It includes a formal specification of an application’s performance behavior and an abstract machine model. It also includes a suite of analysis tools that operate on these models to provide valuable insights for the co-design process. ORNL's Future Technologies group will present Aspen and its use in modeling a three dimensional Fast Fourier Transform at this year's International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '12). Kyle L. Spafford and Jeffrey S. Vetter. Aspen: A Domain Specific Language for Performance Modeling. To appear in the Proceedings of the ACM/IEE International Conference on High Performance Computing, Networking, Storage, and Analysis (Supercomputing ’12).
Scalable Heterogeneous Computing Benchmark Suite (SHOC) (Contact Jeff Vetter)
The Scalable Heterogeneous Computing Benchmark Suite (SHOC) is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures for general purpose computing, and the software used to program them. Its initial focus is on systems containing Graphics Processing Units (GPUs) and multi-core processors, and on the OpenCL programming standard. It can be used on clusters as well as individual hosts.
In addition to OpenCL-based benchmark programs, SHOC also includes a Compute Unified Device Architecture (CUDA) version of many of its benchmarks for comparison with the OpenCL version.
- Multiple benchmark applications written in both OpenCL and CUDA
- Cluster-level parallelism with MPI
- Node-level parallelism for multiple GPUs per node
- Harness for running and easy reporting (in .csv spreadsheet format) of the suite
- Stability tests for large scale cluster resiliency testing
The SHOC benchmark suite is divided into two primary categories: stress tests and performance tests. The stress tests use computationally demanding kernels to identify OpenCL devices with bad memory, insufficient cooling, or other component problems. The performance tests are further subdivided according to their complexity and the nature of the device capability they exercise. This categorization is similar in spirit to that used in the BLAS API. Currently, the levels are:
- Stability Tests
- Performance Tests
- Level 0: Very low level device characteristics (so-called "feeds and speeds") such as bandwidth across the bus connecting the GPU to the host or peak floating point operations per second
- Level 1: Device performance for low-level operations such as vector dot products and sorting operations
- Level 2: Device performance for real application kernels
Institute for Sustained Performance, Energy, and Resilience (SUPER) (Contact Patrick Worley)
Over the next five years (2012-2016), computational scientists working on behalf of the Department of Energy's Office of Science (DOE SC) will exploit a new generation of petascale computing resources to make previously inaccessible discoveries in a broad range of disciplines including physics, chemistry, and material science. The computational systems underpinning this work will increase in performance potential from tens to hundreds of PFlop/s, but in the process will evolve significantly from those in use today. Although Moore's law continues unabated, the end of Dennard scaling has necessitated a fundamental shift in computer architecture focused on power efficiency. To that end, processors are increasingly varied as they strive to satisfy performance, productivity, reliability, and energy efficiency in the face of divergent computational requirements. Today, we see three major offerings: those built from commodity processors (e.g., Cray XE6); those built from processors specialized for energy-efficient HPC (IBM Blue Gene/P); and those built from accelerators (e.g., GPUs). The diversity among these machines presents a number of challenges to merely porting today's scientific applications, much less achieving good performance. Extrapolating five years, we anticipate vastly increased scale (e.g., more chips, 4-8x the cores per chip, wider SIMD) and heterogeneity will exacerbate performance optimization challenges while simultaneously promoting the issues of energy consumption and resilience to the forefront. Just as today’s DOE computing centers incentivize performance optimization through finite computing allocations, they may similarly incentivize energy-efficiency by reducing the charges (in terms of CPU hours) for reduced-power jobs. Moreover, as DRAM-replacements (e.g., phase change, resistive, spin-transfer torque) appear in DOE’s leadership-class systems, computational scientists must learn to exploit the resultant asymmetric read/write bandwidths and latencies. Thus, it is imperative that application scientists be provided with solutions to productively maximize performance, conserve energy, and attain resilience.
To ensure that DOE’s computational scientists can successfully exploit the emerging generation of high performance computing (HPC) systems, the University of Southern California (USC) is leading the Institute for Sustained Performance, Energy, and Resilience (SUPER). We have chosen to organize a broadly-based project with expertise in compilers and other system tools, performance engineering, energy management, and resilience. We are following the successful model that we developed in the SciDAC-2 Performance Engineering Research Institute (PERI) of leveraging the research investments DOE and others have made and integrating the results to create new capabilities beyond the reach of any one group.
Keeneland (Contact Jeff Vetter)
The Keeneland Project is a five-year Track 2D cooperative agreement awarded by the National Science Foundation (NSF) in 2009 for the deployment of an innovative high performance computing system in order to bring emerging architectures to the open science community. The Georgia Institute of Technology (Georgia Tech) and its partners - Oak Ridge National Lab, University of Tennessee-Knoxville, and the National Institute for Computational Sciences - manage the facility, perform education and outreach activities for advanced architectures, develop and deploy software tools for this class of architecture to ensure productivity, and team with early adopters to map their applications to Keeneland architectures.
In 2010, the Keeneland project procured and deployed its initial delivery system (KIDS): a 201 Teraflop, 120-node HP SL390 system with 240 Intel Xeon CPUs and 360 NVIDIA Fermi graphics processors, with the nodes connected by an InfiniBand QDR network. KIDS is being used to develop programming tools and libraries in order to ensure that the project can productively accelerate important scientific and engineering applications. The system is also available to a select group of users to port and tune their codes to a scalable GPU-accelerated system.
In 2012, the Keeneland project will procure and deploy its full scale system.