OpenSHMEM 2018

KEYNOTE SPEAKERS
Gil Bloch

InfiniBand In-Network Computing Technology and Roadmap

ABSTRACT
The latest revolution in HPC is the move to a co-design architecture, a collaborative effort among industry, academia, and manufacturers to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-design architecture exploits system efficiency and optimizes performance by creating synergies between the hardware and the software. Co-design recognizes that the CPU has reached the limits of its scalability, and offers an intelligent network as the new "co-processor" to share the responsibility for handling and accelerating application workloads. By placing data-related algorithms on an intelligent network, we can dramatically improve the data center and applications performance.

A Data-Centric system architecture, which co-locates computational resources and data throughout the system, enables data to be processed all across the system, and not only by CPU's at the edge. For example, data can be manipulated as it is being transferred within the data center network as part of a collective operation. This type of approach addresses latency and other performance bottlenecks that exist in the traditional CPU-Centric architecture. Mellanox focuses on CPU offload technologies designed to process data as it moves through the network, either by the Host Channel Adapter (HCA) or the switch. This frees up CPU cycles for computation, reduces the amount of data transferred over the network, allows for efficient pipelining of network and computation, and provides for very low communication latencies. To accomplish a marked increase in application performance, there has been an effort to optimize often used communication patterns, such as collective operations, in addition to the continuous improvements to basic communication metrics, such as point-to-point bandwidth, latency, and message rate.

InfiniBand technologies are being transformed to support such data-centric system architectures. These include technologies such as SHARP for handling data reduction and aggregation, hardware-based tag matching, Network data hardware-gather scatter capabilities and Zero-overhead Persistent Communication Graph Offload. These technologies are used to process data and network errors at the network levels, without the need for data to reach a CPU, reducing overall volume of transferred data and system resilience.

BIOGRAPHY
Gil Bloch is an HPC and AI specialist with broad experience in fast interconnect technologies for clusters, datacenters and cloud computing. His current responsibilities include in-network computing for HPC and machine learning. Before working on in-network computing, Gil had multiple engineering and architecture positions including network adapters and switches ASIC design and architecture, RDMA offload ASIC and open source networking software for high performance computing. Gil is an author/co-author of multiple patents in the area of computer networks and network adapters. Gil holds a BSc degree in Electrical Engineering from the Technion, Israel Institute of Technology.

 

 

Will Deacon

Formalising the Armv8 Memory Consistency Model

ABSTRACT
Armv8 introduced a radical change to the memory consistency model of the architecture by requiring that a store to memory becomes visible to all other threads at the same time. This property, known as other-multicopy atomicity, simplifies the memory model definition and supports straighforward, compositional reasoning about concurrent programs. The memory model is now specified such that the architectural text maps directly to an executable, axiomatic model which can be used to verify properties of both concurrent software and processor designs.

This presentation will provide an introduction to memory consistency models before focussing on the design of the Armv8 model and the tools which can be used to help reason about it.

BIOGRAPHY
Will is a Senior Principal Software Engineer in the Open-Source software group at Arm, where he works primarily on enabling their architecture ports of the Linux kernel and ensuring that the architecture remains a good fit for modern, general-purpose operating systems. He is also heavily involved in the design and formalisation of the Armv8 memory consistency model, which was recently revised. As an active upstream maintainer, he enjoys working close to the metal and has keen interests in computer architecture, concurrency, weak memory ordering and open-source software.