OpenSHMEM 2014
AGENDA
Tuesday, March 4
8:00 AM - Registration Opens 8:30 AM - Morning Tutorials
OpenSHMEM/UCCS Tutorial
Held in Governor Calvert Ballroom East, located in the Governor Calvert House
Tutorial led by Tony Curtis, Swaroop Pophale, Aaron Welch, University of Houston
OpenSHMEM is a one-sided communication on library API aimed at standardizing several vendor implementations of SHMEM. In this tutorial, we present an introductory course on uthese of OpenSHMEM, its current state and the community’s future plans. We will show how to use OpenSHMEM to add parallelism to programs via an exploration of its core features, to port sequencial applications to run at scale while improving the program performance, and discuss how to migrate existing applicaions that use message passing techniques to equivalent OpenSHMEM programs that run more effi ciently. Tips for porting programs using other existing flavors of SHMEM to portable OpenSHMEM programs will be given. The second part of the tutorial will focus on the plans for OpenSHMEM development, including a look at new PGAS run-time software called UCCS. UCCS is designed to sit underneath PGAS user-oriented libraries and languages such as OpenSHMEM, UPC, CAF and Chapel.
Accelerator Programming with OpenACC and OpenSHMEM Tutorial
Held in Governor Calvert Ballroom Center, located in the Governor Calvert House
Tutorial led by Jean-Charles Vasnier, Applications Engineer at CAPS Enterprise
This tutorial has been designed for those who are interested in porting their OpenSHMEM applications to a hardware accelerator, such as a GPU, using OpenACC. Following a mixture of lectures and demonstrations, we will explore the basic steps to port an application on the GPU. First, attendees will learn how to port a kernel on the GPU using directives. Then we see how to improve the overall performance of the application by reducing the data transfers between the host and the accelerators and by tuning the kernel.
1:00 PM - Afternoon Tutorials
OpenSHMEM Tools Tutorial
Governor Calvert Ballroom East, located in the Governor Calvert House
Tutorial led by Nick Forrington, Allinea, Oscar Hernandez, Oak Ridge National Laboratory; Sameer Shende, Paratools; Frank Winkler, Dresden
This tutorial will focus on the state-of-the-art of tools available for OpenSHMEM including a tutorial on program analysis, performance and debugging tools currently available for OpenSHMEM. We will also discuss the future roadmap to provide an integrated tools environment for OpenSHMEM. The tools that we will cover are: the OpenSHMEM Analyzer, TAU Performance Analysis tools, Vampir Tracing Tools, and DDT Debugger for OpenSHMEM. TAU is a performance tool that provides portable profi ling and tracing for OpenSHMEM applications. This tutorial provides hands-on exercises on how this tool integrates with OpenSHMEM. Vampir is tool-set for performance analysis that traces events and identifies problems in HPC applications. It is the most scalable tracing analysis tool that can scale upto several hundred thousand processes. It consists of the run-time measurement system VampirTrace and the visualization tools Vampir and VampirServer. In this tutorial, we will present how to use Vampir to trace OpenSHMEM applications at scale. The DDT port on of the tutorial will cover the fundamentals of debugging multi - process OpenSHMEM programs with the Allinea DDT parallel debugging tool, and will include an introduction to the DDT user interface and how to start programs, as well as how to track down crashes and compare variables across processes. The OpenSHMEM Analyzer is a compiler-based tool that can help users detect errors and provide useful analyses about their OpenSHMEM applications. In this tutorial we will show how the tool can be used to detect incorrect use of variables in OpenSHMEM calls, out-of-bounds checks for symmetric data, checks for incorrect initialization f pointers to non-symmetric data, and symmetric data alias information.
VERBS Programming Tutorial
Held in Governor Calvert Ballroom Center, located in the Governor Calvert
Tutorial led by Dotan Barak, Senior So ware Manager, Mellanox Technologies
This tutorial provides a basic overview of the Infi niBand technology and explain its advantages as a networking technology. Among others, this tutorial covers the following topics: various Infi niBand hardware and software components; explain how to utilize the Infi niBand technology for best performance; review the verbs API which is required for programming over Infi niBand; and fi nally it will provide several tips and tricks on verbs programming.
Wednesday, March 5
8:00 AM |
Registration desk opens
|
8:30 AM |
Welcome and Introductions (Working Breakfast)Steve Poole, Oak Ridge National Laboratory
|
8:45 AM |
Future Technologies for InfinibandPresented by Richard Graham at Mellanox Technologies The talk will provide a description of Mellanox’s OpenSHMEM architecture, implementaion, and benchmark results. It will also
|
9:35 AM |
The Evolution of the NVIDIA Compute Device Memory ModelPresented by Donald Becker and Duncan Poole, NVIDIA This talk will discuss the evolution of the NVIDIA compute device memory model from isolated address spaces on CPUs and
|
10:30 AM |
OpenSHMEM on PortalsPresented by Keith Underwood, Network Architect, Intel Corportation SHMEM originated in the context of a very specific hardware platform. Over the years, various SHMEM implementations have |
11:50 AM |
Keynote: Hybrid Programming Challenges for Extreme Scale
|
1:00 PM |
Cray's OpenSHMEM activities & their proposal for thread-safe SHMEM extensionsPresented by Monika ten Bruggencate, Software Engineer at Cray, Inc. This talk will give an overview of Cray's OpenSHMEM on-going activities and their planned support for thread-safety for
|
1:50 PM |
MPI + X (OpenSHMEM?) (Working Lunch)Presented by Michael Raymond, SGI As the number of compute elements on a node increase, the HPC world has decided that the dominant programming model should be MPI between nodes and X within a node, where X might be OpenMP, pthreads, UPC, etc. What about OpenSHMEM? This talk will explore the implications of using OpenSHMEM as X, including the benefi ts and the weaknesses.
|
2:40 PM |
Unified Common Communication Substrate (UCCS)Presented by Pavel Shamis, Oak Ridge National Laboratory and Universal Common Communication Substrate (UCCS) is a low-level communication substrate that exposes high-performance
|
3:35 PM |
Future Technologies for AMDPresented by Vinod Tipparaju, AMD This talk introduces HSA and discusses how HSA simplifi es the use of accelerators by supporting unifi ed programming models. HSA enhances support for symmetric memory in the context of submitting work to the accelerators. This talk will discuss HSAs support for asynchronous functions, function closures and lambda functions which enables support for various programming models and languages.
|
4:30 PM |
IBM OpenSHMEM implementation over the Parallel Active
|
5:30 PM |
HIPATIA Birds Of a Feather SessionPresented by Josh Lothian, Jonathan Schrock, & Mathew Baker, Oak Ridge National Laboratory HIPATIA (High Performance Adaptive Integrated Linear Algebra Benchmark) is a next-generation benchmark that is easily extensible while providing access to power metrics and CPU counters. Unlike many of the more popular benchmarks today, HIPATIA's initial focus is on solving sparse matrices within the integer domain using GMP. In addition to sparse, integer matrices, HIPATIA will be configurable for computation on real, complex, or fixed-point values, in dense or sparse matrix formats. We intend HIPATIA to adapt to many different usage scenarios that are not currently well represented in existing benchmarks. We will discuss current progress of HIPATIA development, as well as future development plans. |
Thursday, March 6
8:00 AM |
OpenSHMEM Implementations and Evaluation Session
|
11:30 AM |
OpenSHMEM Tools Session (Working Lunch)Profiling Non-Numeric OpenSHMEM Applications with the TAU Performance SystemPresented by John Linford and Tyler Simon, ParaTools, Inc. The recent development of a unifi ed SHMEM framework, OpenSHMEM, has enabled further study in the porting and scaling of applications that can benet from the SHMEM programming model. This paper focuses on non-numerical graph algorithms, which typically have a low FLOPS/byte ratio. An overview of the space and time complexity of Kruskal's and Prim's algorithms for generati ng a minimum spanning tree (MST) is presented, along with an implementation of Kruskal's algorithm that uses OpenSHEM to generate the MST in parallel without intermediate communication. Additionally, a procedure for applying the
|
12:00 PM |
OpenSHMEM Tools Session (continued)Towards Parallel Performance Analysis Tools for the OpenSHMEM StandardPresented by Andreas Knüpfer, Technische Universitat Dresden This paper discusses theoretical and practical aspects when extending performance analysis tools to support the OpenSHMEM standard for parallel programming. The theoretical part covers the mapping of OpenSHMEM's communication primitives to a generic event record scheme that is compatible with a range of PGAS libraries. The visualization of the recorded events is included as well. The practical parts demonstrate an experimental extension for Cray-SHMEM in Vampir-Trace and Vampir and the first results with a parallel example application. Since Cray-SHMEM is similar to OpenSHMEM in many respects, this serves as a realistic preview. Finally, an outlook Extending the OpenSHMEM Analyzer to Perform Synchronization and Muli-Valued AnalysisPresented by Swaroop Prophale, University of Houston OpenSHMEM Analyzer (OSA) is a compiler-based tool that provides static analysis for OpenSHMEM programs. It was developed with the intention of providing feedback to the users about semantics errors due to incorrect use of the OpenSHMEM API in their programs, thus making development of OpenSHMEM applications an easier task for beginners as well as experienced programmers. In this paper we discuss the improvements to the OSA tool to perform parallel analysis to detect the collective synchronization structure of a program. Synchronization is a critical aspect of all programming models and in OpenSHMEM it is the responsibility of the programmer to introduce synchronization calls to ensure the completion of communication among processing elements (PEs) to prevent use of old/incorrect data, avoid deadlocks and ensure data race free execution and keeping in mind the semantics of OpenSHMEM library specification. A Global View Programming Abstraction for Transitioning MPI Codes to PGAS LanguagesPresented by Tiffany Mintz, Oak Ridge National Laboratory The multicore generation of scientific high performance computing has provided a platform for the realization of Exascale computing, and has also underscored the need for new paradigms in coding parallel applications. The current standard for writing parallel applications requires programmers to use languages designed for sequential execution. These languages have abstractions that only allow programmers to operate on the process centric local view of data. To provide suitable languages for parallel execution, many research efforts have designed languages based on the Partitioned Global Address Space (PGAS) programming model. Chapel is one of the more recent languages to be developed using this model. Chapel supports multithreaded execution with high-level abstractions for parallelism. With Chapel in mind, we have developed a set of directives that serve as intermediate expressions for transitioning scientific applications from languages designed for sequential execution to PGAS languages like Chapel that are being developed with parallelism in mind.
|
1:30 PM |
OpenSHMEM Extensions SessionParallel I/O for OpenSHMEMPresented by Edgar Gabriel, University of Houston This talk discusses the necessity of I/O interfaces in any parallel programming model for the next generation of high end systems. Some suggestions for parallel I/O interfaces for OpenSHMEM will be presented based on the experience of the MPI I/O interfaces and some recent work on parallel I/O for OpenMP. Reducing Synchronization Overhead Through Bundled CommunicationPresented by James Dinan, Intel Corporation OpenSHMEM provides a one-sided communication interface that allows for asynchronous, one-sided communication operations on data stored in a partitioned global address space. While communication in this model is efficient, synchronizations must currently be achieved through collective barriers or one-sided updates of sentinel locations in the global address space. These synchronization mechanisms can over synchronize, or require additional communication operations, respectively, leading to high overheads. We propose a SHMEM extension that utilizes capabilities present in most high performance interconnects (e.g. communication events) to bundle synchronization information together with communication operations. Using this approach, we improve ping pong latency for small messages by a factor of two, and demonstrate significant improvement to synchronization-heavy communication patterns, including all-to-all and pipelined parallel stencil communication. Implementing Split-Mode Barriers in OpenSHMEMPresented by Michael Raymond, SGI Corporation Barriers synchronize the state of many processing elements working in parallel. No worker may leave a barrier before all the others have arrived. High performance applications hide latency by keeping a large number of operations in progress asynchronously. Since barriers synchronize all these operations, maximum performance requires that barriers have as little overhead as possible. When some workers arrive at a barrier much later than others, the early arrivers must sit idle waiting for them. Split-mode barriers provide barrier semantics while also allowing the early arrivers to make progress on other tasks In this paper we describe the process and several challenges in developing split-mode barriers in the OpenSHMEM programming environment. OpenSHMEM Extensions and a Vision for its Future DirectionPresented by Pavel Shamis, Oscar Hernandez, Greg Koenig, Oak Ridge National Laboratory The Extreme Scale Systems Center (ESSC) at Oak Ridge National Laboratory (ORNL), together with the University of Houston, led the eff ort to standardize the SHMEM API with input from the vendors and user community. In 2012, OpenSHMEM Specification 1.0 was fi nalized and released to the OpenSHMEM community for comments. As we move to future HPC systems, there are several shortcomings in the current specification that we need to address to ensure scalability, higher degrees of concurrency, blocality, thread safety, fault-tolerance, I/O, etc. In this paper we discuss an immediate set of extensions that we propose to the current API and our vision for a future API, OpenSHMEM Next-Generation (NG), that targets future Exascale systems. We also explain our rational for the proposed extensions and highlight the lessons learned from other PGAS languages and communication libraries.
|
3:30 PM |
Panel DiscussionThe Future of OpenSHMEM Moderator: Steve Poole, Oak Ridge National Laboratory
|
5:00 PM |
The 2013 OpenSHMEM Workshop Closes |
Invited Speakers
We will have a series of invited talks at the Workshop, from Industry, Academia and U.S. National Laboratories on the latest development of OpenSHMEM and related technologies. These talks with be combined with the paper presentations.