OpenSHMEM 2015

AGENDA - Tutorials

Full Program (18Mb)

Tuesday August 4th

7:00 AM: Breakfast (Provided)


8:00 AM: Introduction
Pavel Shamis

Latest in OpenSHMEM: Specification, API, and Programming
Pavel Shamis, Manjunath Gorentla Venkata, Swaroop Pophale, Dounia Khaldi

This tutorial will cover both introductory and some intermediate level concepts. The introductory part will cover the core features of the OpenSHMEM API with focus on new and modified features for 1.2, including the interfaces, semantics, and programming model. The intermediate part of the tutorial will provide a hands-on experience with writing, compiling, and running a simple OpenSHMEM program, and then discusses the OpenSHMEM semantics and what it lacks for emerging architectures. Using an open source OpenSHMEM implementation, the last part of the tutorial will delve into the details of an OpenSHMEM implementation, with a brief overview of each layer.

Prerequisites: The tutorial will be structured so as to be attractive to both programmers who are unfamiliar with the details of OpenSHMEM and PGAS environments and more seasoned programmers and scientists who want to acquaint themselves with the state of the art of such a technology. We expect that the attendees have familiarity with C, C++ or Fortran as those are the languages that OpenSHMEM targets, but note that we use C examples in the tutorial.

11:00 AM: UCX - Communication Framework for Next Generation Programming Models
Yossi Itigin, Mellanox

Unified Communication X (UCX) is a set of network APIs and their implementations for high throughput computing. UCX comes from the combined effort of national laboratories, industry, and academia to design and implement a high-performing and highly-scalable network stack for next generation applications and systems. UCX design provides the ability to tailor its APIs and network functionality to suit a wide variety of application domains. We envision these APIs to satisfy the networking needs of many programming models such as Message Passing Interface (MPI), OpenSHMEM, Partitioned Global Address Space (PGAS) languages, task-based paradigms and I/O bound applications. UCX is an open source, BSD licensed software hosted on GitHub.

Prerequisites - none.

1:00 PM: GP-GPU Programming with CUDA
Larry Brown, NVIDIA

This workshop will provide a tour of general purpose GPU (GP-GPU) programming with CUDA. The material will start at an introductory level and progress into intermediate to advanced topics. We will cover the fundamental concepts of heterogeneous computing using GPU accelerators. Topics will include an Introduction to CUDA, NVIDIA Developer Tools and GPU Computing Libraries, Optimization, Streams and Multi-GPU Programming.

Prerequisites - The tutorial will assume familiarity with traditional single threaded programming (as in standard C/Fortran/Java/Python).


8:00 AM: Introduction
Manjunath Gorentla Venkata

Developing, Debugging and Profiling OpenSHMEM Applications
David Lecomber, Allinea Software

This tutorial session will present an overview of the Allinea tools for debugging and profiling. We begin with an introduction to Allinea DDT for OpenSHMEM debugging, in which we will cover how to use the debugger for eciently solving OpenSHMEM application bugs, from small to large scale parallelism. Finally we consider the performance of OpenSHMEM applications and how to use Allinea MAP to profile performance and identify potential optimizations in areas such as vectorization, memory bandwidth, I/O and synchronization.

Prerequisites - none.

Performance Evaluation of OpenSHMEM Applications Using TAU
Sameer Shende, University of Oregon

The TAU Performance System is a powerful and highly versatile profiling and tracing tool ecosystem for performance analysis of parallel programs at all scales. Developed over the last two decades, TAU has evolved with each new generation of HPC systems and presently scales eciently to hundreds of thousands of cores on the largest machines in the world. This tutorial will focus on performance data collection, analysis, and performance optimization of OpenSHMEM applications. The tutorial will introduce proling and debugging support in TAU, cover memory usage, POSIX I/O, and support for various runtime systems (including OpenSHMEM, MPI, pthread, CUDA, OpenACC, and OpenCL). TAU's support for hardware performance counters and recent support for power and energy profiling will be demonstrated. TAU can generate traces in the OTF2 format for Vampir using the Score-P measurement library. The tutorial will also demonstrate TAU's 3D profile browser, ParaProf, and TAUdb, a data management framework used by TAU's PerfExplorer tool for cross-experiment analysis such as scalability studies. The participants are encouraged to download the OVA lite image from the HPCLinux distribution [] on their laptops prior to the tutorial.

Prerequisites - Familiarity with building and running parallel programs on Linux.

3:30 PM: Parallel Performance Analysis for OpenSHMEM with Score-P and Vampir
Joseph Schuchart, TU-Dresden

The tutorial will present an introduction to the performance analysis tools Score-P and Vampir. Score-P is for instrumenting the parallel application code and for recording event traces during a measurement run of the target application on the target HPC machine. Vampir is for the visual and interactive analysis of the generated event traces after the measurement run. The tutorial shows the first steps with the tools, the most important parameters and switches, and some first-hand experiments how to identify typical performance flaws.

Prerequisites - A basic understanding of parallel processing and OpenSHMEM programming would be required to just follow the presentations. Attendees should have a laptop and access to Titan to follow the step by step examples.