OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologies.

Steve Oberlin

GPUs, NVLink, and the Dawn of a New SHMEM Golden Age

In the early days of distributed HPC, SHMEM was created to enable efficient direct access by programs to the custom-architecture shared global memory of the first Cray Research MPP, the Cray T3D. In subsequent decades, the commoditization of HPC clusters around standard servers and networks contributed to the dominance of coarse-grain message-passing programming models compatible with their modest communications capabilities. Lacking efficient native GAS platforms, SHMEM applications are in the minority today, and most PGAS implementations are built on top of MPI runtimes and primitives.

Recently, a new, inherently-parallel processor architecture has emerged that offers, for the first time, the possibility of efficient native GAS implementation in high-volume devices. GPUs have evolved from fixed-function scan-line rendering engines into powerful general-purpose parallel coprocessors with unprecedented memory bandwidth, latency-hiding, and fine-grain synchronization capabilities. NVIDIA's latest generation "Pascal" GPU architecture introduces NVLink, a new interconnect interface that extends the GPU memory model across multiple directly-connected processors, enabling native loads/stores/atomics to another device's memory at very high speed with low overhead.

The success of and cost-effectiveness of GPUs has driven performance and productivity in several areas of science and technology, most recently emerging as the de facto platform for the exploding field of machine learning. Native support for global shared memory across multiple NVLink-connected GPUs could herald the arrival of a new golden age of SHMEM adoption and applications growth.

This talk will sparsely review the history of architecture support for GAS and SHMEM in distributed HPC, discuss the necessary elements of HW support for efficient implementations, compare and contrast CPU and GPU microarchitectures and their ability to provide such support, introduce NVLink and Pascal's first implementation of it, and describe early NVLink-connected multi-GPU systems and some initial performance results using NVSHMEM, a native GPU-initiated-communications SHMEM implementation.

Steve Oberlin has been innovating in high-performance computing (HPC) since 1980, when he joined Cray Research bringing up CRAY-1 supercomputers. Career highlights include working for Seymour Cray designing the CRAY-2 and CRAY-3 vector supercomputers, and leading the architecture and design of Cray Research's first massively parallel processors, the T3D (the first SHMEM machine) and T3E.

In the early 21st Century, Steve stepped away from HPC to co-found and lead a couple of cloud computing start-ups, but returned to his first love in 2013, joining NVIDIA as the CTO for Accelerated Computing




James C. Sexton

IBM's Directions for Data Centric Systems.

The last number of years have seen a very significant inflection point in computer systems design to tackle the new complexities which arise in the modeling, simulation and analysis of complex data.  IBM has adopted a direction for future systems design which is data centric in approach and which seeks to develop through co-design solutions that can deliver extreme performance for big data analytics.  This presentation will describe IBM's data centric systems approach and discuss the critical challenge which all emerging systems designs must address to provide a usable portable performance programming approach to systems that are, for technology constraints, complex and heterogeneous in makeup.

Dr. James Sexton is and IBM Fellow and Director of the Data Centric Systems department at IBM T. J. Watson Research Center in New York. Dr. Sexton received his Ph.D. in Theoretical Physics from Columbia University, NY. His areas of interest lie in High Performance Computing, Computational Science, Applied Mathematics and Analytics. Prior to joining IBM, Dr. Sexton held appointments as Lecturer then Professor at Trinity College Dublin, as postdoctoral fellow at IBM T. J. Watson Research Center, at the Institute for Advanced Study at Princeton and at Fermi National Accelerator Laboratory. He has held adjunct appointments as Director and Founder of the Trinity Center for High Performance Computing, as a Board Member for the Board of Trinity College Dublin, as Senior Research Consultant for Hitachi Dublin Laboratory, and as a Hitachi Research Fellow at Hitachi's Central Research Laboratory in Tokyo. Dr. Sexton has over 70 publications and has participated on three separate Gordon Bell Award winning teams.