OpenSHMEM 2015

AGENDA - Invited Talks

Wednesday August 5th - Invited Talks

10:30 AM: Invited Talk: Intel's Multifaceted PGAS Activities and Community Engagements
Ulf Hanebutte, Intel

In this talk, we will discuss recent PGAS research work, new approaches to HPC networking, and proposed extensions to the OpenSHMEM specification.

1:00 PM: Invited Talk: OpenSHMEM - The InfiniBand Advantage
Rich Graham, Mellanox

The OpenSHMEM specification forms a good foundation for supporting low-overhead, distributed, and asynchronous computation.  Mellanox has been developing hardware and software capabilities that provide the needed building blocks that enable effective use of OpenSHMEM based applications.  This presentation will describe some of these hardware capabilities, with an emphasis on their instantiation in the EDR ConnectX-4 and Switch-IB hardware.  In addition the talk will describe Mellanox's work to develop a production grade OpenSHMEM implementation.

3:30 PM: Invited Talk: Improving Application Scaling using OpenSHMEM for GPU-Initiated Communication
Sreeram Potluri, NVIDIA

State-of-the-art scientific applications running on GPU clusters typically offload computation phases onto the GPU using CUDA or directives approach while relying on the CPU to manage cluster communication. This dependency on the CPU for communication has limited their strong scalability, owing to the overhead of repeated kernel launches, CPU-GPU synchronization, underutilization of the GPU during synchronization, and underutilization of network during compute. Addressing this apparent Amdahl's fraction is imperative for strong scaling of applications on GPU clusters. GPUs are designed for extreme throughput and have enough parallelism and state to hide long latencies to global memory. CUDA programming model and practices guide application developers to take advantage of this throughput oriented architecture. It is important to take advantage of these inherent capabilities of the GPU and the CUDA programming model when tackling communication on GPU clusters. NVSHMEM is a prototype implementation of OpenSHMEM that provides a Partitioned Global Address Space (PGAS) spanning memory across multiple GPUs. It supports API for fine-grained GPU-GPU data movement and synchronization from within a CUDA kernel. This talk outlines the implementation of NVSHMEM on single-node and multi-node GPU architectures. Example applications from multiple domains are used to demonstrate the use of GPU-initiated communication and its impact on performance and scaling.