CUMULVS: Collaborative User Migration, User Library for Visualization and Steering

DOE 2000 - ACTS Toolkit Research

FY 1999 Highlights

Objective:

CUMULVS is an ongoing project at Oak Ridge National Laboratory (ORNL) as part of the DOE 2000 Advanced Computational Testing and Simulation (ACTS) Toolkit. The CUMULVS work focuses on the integration of collaborative computational steering tools, interactive visualization tools, and checkpointing / fault tolerance tools into runtime systems, frameworks and applications.

The ACTS Toolkit

The ACTS Toolkit has the goal to provide an integrated set of software tools, algorithms, and environments that accelerate the adoption and use of advanced computing by DOE programs. The long-term goal of this work is to bring together the major developers and users of high-performance software tools and numerical software to develop and implement a common tool integration interface. The project consists of three distinct and complementary parts: software development, numerical kernels, and runtime support. This work is a collaborative effort involving Oak Ridge National Laboratory (ORNL), Los Alamos National Laboratory (LANL), Lawrence Livermore National Laboratory (LLNL), Sandia National Laboratories (SNL), Lawrence Berkeley National Laboratory (LBL), Argonne National Laboratory (ANL), and several university partners.

Several ACTS tools from the DOE laboratories have already been integrated and made interoperable to share portions of their functionality. A more general integration solution is being designed via the Common Component Architecture (CCA) forum for specifying reusable high performance software components. The CCA provides a framework and services for instantiating and combining scientific software components to form efficient, scalable simulation programs (see below).

Outline:

This report includes:
CUMULVS Overview
FY 1998 Accomplishments
Future Plans
Downloading and Using CUMULVS
Development Team Participants


CUMULVS Overview

CUMULVS is a software infrastructure for the development of collaborative scientific software simulations. Developers of high-performance scientific software can utilize CUMULVS to dynamically attach front-end viewer programs to a running simulation and interact with it on-the-fly. CUMULVS supports interactive run-time visualization and remote computational steering of distributed applications by multiple collaborators, and provides a mechanism for constructing fault-tolerant, migrating applications in heterogeneous distributed computing environments. CUMULVS is also being applied to the coupling of models for hybrid or cooperative simulations, and can assist in the sharing and interpolation of data fields among disparate models. For more details on the CUMULVS system, see the CUMULVS Home Page.


FY 1999 Accomplishments

- SC99: CCA / CUMULVS Demonstration

- CUMULVS and CCA Collective Technology

- CUMULVS Virtual Reality Viewers

- CUMULVS and Message Handlers / Nexus

- CUMULVS and Global Arrays

At the SC99 conference in Portland, OR, November 1999, Oak Ridge National Laboratory (ORNL) and Sandia National Lab (SNL) presented the first public demonstration of a Common Component Architecture (CCA) framework for high-performance scientific computing. This component-based framework provides on-the-fly construction of scientific simulations from reusable software modules, and supports high-performance component interconnects for efficient and scalable execution.

For the SC99 demonstration, the CUMULVS system, under development at ORNL for interactive visualization, computational steering and application fault-tolerance, was combined together with ESI (Equation Solver Interface), a numerical solver system being developed at SNL. Functionality from these two systems was encapsulated into CCA components which were then interactively combined using a graphical CCA framework interface. The resulting physics simulation modeled the temperature distribution across a metal plate. The output of the ESI solver was dynamically hooked together with several CUMULVS components for ongoing visualization of the progress of the simulation. An intermediate prototype "collective" component formed the "glue" between the parallel ESI model and the CUMULVS viewer, and provided extraction of the distributed data as necessary for visualization.

In this joint project between ORNL and SNL, the components were written in C++ and C, and were interactively connected using a Java framework, to attach the ESI modules to various CUMULVS viewers. These viewers used AVS and Tcl/Tk to display ongoing animations of the physics computations. This demonstration was given several times over the course of SC99, along with another CCA Forum demonstration, and helped to showcase state-of-the-art DOE research.

Click on these images to see screen dumps of the CUMULVS slicer interface displaying the Heat Flux or the Temperature of the ESI physics simulation:

As part of the Common Component Architecture (CCA) forum, Oak Ridge National Laboratory (ORNL) has taken the lead in creating a "Collective" specification for defining, extracting and sharing parallel or distributed datasets. CCA component frameworks are used to compose collections of reusable high-performance software modules to construct scientific simulation programs. An inherent need in such a framework is to communicate and share data among these various parallel components. The most challenging scenario is where two parallel components decompose their data using different schemes or a different number of processors; this case is known as the "MxN" problem. ORNL has proposed a base definition for a CCA "Collective Component" which will allow flexible access to distributed data in a parallel component. This initial specification describes distributed data decompositions for rectangular grids, and specifies how these descriptions can be used to automatically extract data subsets from parallel components. This specification generalizes the mechanisms present in the CUMULVS system at ORNL. Extensions to the specification will incorporate several synchronization types and hooks for data interpolation, both spatially and temporally. An MxN interface will be defined to support both automatic and direct collective data operations. When completed the Collective interface will provide a mechanism for simple and arbitrary connections among components, that allow data to be translated and shared among any CCA-compliant serial and/or parallel components.

A new front-end Virtual Reality (VR) viewer was added to CUMULVS as part of a joint project with the University of Liverpool. This viewer takes data collected from a running simulation (via CUMULVS) and uses VTK to construct visual geometries for rendering, either on a workstation, on the ImmersaDesk or in a CAVE. Ongoing research will explore various techniques for controlling the virtual view using the VR input devices, as well as invoking computational steering operations from within the virtual world. A new CUMULVS viewer for AVS Express, a commerical object-oriented visualization system, is also being designed and implemented.

Several experiments were performed to explore the feasibility of porting CUMULVS to the Globus/Nexus environment developed at Argonne National Laboratory (ANL), as an alternative to the PVM message-passing substrate. A simple prototype was constructed to mimic the CUMULVS communication patterns and functionality using the Nexus communication library instead of PVM. Given the success of this prototype, CUMULVS is being extended to use Nexus as well as PVM, to provide better support for MPI applications that use Mpich-G on Globus. In preparation for this transformation, the internal message processing in CUMULVS has been overhauled. All communication in Nexus is based on message-handlers rather than direct point-to-point messages, therefore all such direct messaging in CUMULVS has been converted to instead use only message handlers for processing. This has the added benefit of more timely responses for CUMULVS in the PVM-based protocols, as viewer requests can be processed transparently amidst regular user messaging using PVM's message handler interface.

As part of the ACTS Toolkit research, CUMULVS has been integrated with several external systems and tools. This past year Pacific Northwest National Laboratory (PNNL) created hooks that interface to CUMULVS in their Global Arrays system. Users of Global Arrays can now automatically utilize the visualization and computational steering capabilities of CUMULVS. For more information, see the Global Arrays Related Software web page.


Future Plans

Our ongoing work on CUMULVS and the ACTS Toolkit continues to emphasize the integration of run-time visualization, coordinated computation steering and application fault tolerance capabilities into ACTS' runtime, numerics, and framework components. Our research will continue to expand the CUMULVS runtime environment to increase its applicability to a wider base of applications and tools.

- CUMULVS & CCA

- CUMULVS and Unstructured / Adaptive Meshes

- Model Coupling Using CUMULVS

- Integration of CUMULVS with Harness

- CUMULVS Checkpointing Optimization

- CUMULVS Instrumentation

Efforts will be made to further enhance the definitions for "Collective Components" in the CCA. In developing data sharing interfaces for rectilinear meshes, it has become evident that a more general definition for distributed Data Decompositions is needed. Fundamentally, this specification will need to describe the detailed nature of parallel data - its local storage allocation, its context in the global decomposition, and how each element can be extracted or accessed. This specification will better support the existing Collective Component specification as well as other necessary extensions.

Beyond the theoretical work in the CCA, several concrete experiments will be performed to verify and validate the proposed specifications. A simple but generic Collective Component will be implemented to automatically translate data fields among parallel components that use standard rectangular mesh decompositions. This reference implementation will utilize existing technology from the CUMULVS system at ORNL to construct a more general purpose reusable component.

An increasing number of applications utilize unstructured mesh decompositions, either triangular or particle-based. Even more complex still is the use of adaptive meshes that change from iteration to iteration. As part of the upcoming CCA forum work, a general specification for Unstructured and Adaptive Meshes will be created. This specification will be part of the larger general Data Decomposition specification, which is a joint effort among several of the national labs and universities in the CCA Forum. Due to the level of complexity, a wide range of possibilities will be explored and encapsulated in this specification.

Using the CCA specification, support for Unstructured and Adaptive Meshes will be added to CUMULVS. Currently, CUMULVS supports all standard rectilinear distributed data decompositions, such as Block and Cyclic, as well as other Explicit decompositions. There is also support for abstract Particle decompositions, but use of this interface for truly unstructured data is cumbersome. The Particle interface will either be reworked or an additional Unstructured decomposition type will be added to ease the instrumentation burden. Similarly, AMR adds a dynamic element to the unstructured case, and therefore the base decomposition interface must be enhanced to assist with the integration of CUMULVS into AMR applications.

A special library interface will be added to CUMULVS to assist in Model Coupling via Data Field Sharing. CUMULVS already encapsulates sufficient functionality to support data sharing between two or more applications for purposes other than visualization. Each task in a parallel application can effectively act as a "viewer" onto another parallel application, thereby extracting subsets of data periodically for other functions such as model coupling. However, this approach is not especially intuitive for scientists intending to correlate various scientific models. Therefore a special interface will be added to CUMULVS to directly support model coupling functionality. Aside from clarifying the use of CUMULVS in this respect, this interface will also provide hooks for related features such as temporal and spatial interpolation. This technology is essential for real production model coupling, and the variety and complexity of interpolation techniques represents its own immense area of research.

The Harness project at ORNL is a pluggable heterogeneous distributed computing environment as a follow-on to PVM. The Harness architecture naturally integrates well with the CCA specification for high-performance scientific components. The CUMULVS technology will be integrated with the Harness system, in the form of two distinct dynamic "plug-ins". Any applications wishing to invoke CUMULVS functionality can simply load the given plug-ins and then invoke the desired library calls. One plug-in will provide the full interface for Visualization and Computational Steering, the other plug-in will provide the Checkpointing and Fault Recovery subsystem. In effect, aside from wrapping the given CUMULVS libraries as proper Harness plug-ins, this work will constitute porting CUMULVS to the Harness communication substrate and environment. This will add Harness as another supported user environment, so that all Harness applications, such as those written using PVM or Fault-Tolerant MPI (FT-MPI), will be able to use the CUMULVS features.

To improve the usefulness and feasibility of using CUMULVS for application fault tolerance, the CUMULVS checkpointing facility will be optimized. Checkpointing overhead will be reduced by integrating parallel I/O technology, to more efficiently write checkpoint data to disk. The use of a parallel file system will be explored, along with other portable approaches for coordinating the simultaneous writing of each task's checkpoint data. The run-time fault recovery system will also be generalized to provide a spectrum of adjustable redundancy levels. This will control the checkpointing overhead by allowing an application to customize its desired fault recovery requirements.

The only challenge to applying CUMULVS to a new or existing simulation program is the need for instrumentation of the relevant data fields and steering parameters. A Problem Solving Environment (PSE) will be designed to assist in instrumenting applications for CUMULVS (or other CCA-compliant Collective tools). This PSE will likely consist of a graphical user interface (GUI) and a source code pre-processor. The PSE will assist the applications programmer in identifying and describing any relevant data fields or steerable parameters in a simulation. The PSE will also expedite the instrumentation of applications for fault tolerance by providing a means for selecting the minimal program state to checkpoint and helping to insert checkpointing operations and requests.


Downloading and Using CUMULVS

The latest version of the CUMULVS software can be downloaded from the CUMULVS Home Page. This software works on most Unix platforms, and there is also a preliminary distribution for Windows platforms.

Currently, CUMULVS is built on top of PVM, so you must download and install PVM to use it. CUMULVS works with either the PVM 3.3.11 or PVM 3.4.* releases.

The CUMULVS software comes with several pre-defined viewer programs, including an AVS viewer, a Tcl/Tk slicer, and a simple text-based viewer. To use the AVS viewer, you must first purchase AVS for your system. To use the Tcl/Tk viewer, you must download and install the Tcl/Tk system. CUMULVS will work with versions Tcl 7.4 / Tk 4.0 or later. Additional information on Tcl/Tk can be found at http://www.tcltk.com/.


Development Team Participants

Jim Kohl (ORNL)
Al Geist (ORNL)
Phil Papadopoulos (UCSD)
Ben Allan (SNL)
Rob Armstrong (SNL)

For more details on CUMULVS or questions,
email: cumulvs@msr.csm.ornl.gov

Distributed Computing Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory
P.O. Box 2008, Bldg 6012, MS 6367
Oak Ridge, TN 37831-6367

Research supported by the Mathematics, Information and Computational Sciences Office, Office of Advanced Scientific Computing Research, Office of Science, U. S. Department of Energy, under contract No. DE-AC05-00OR22725 with UT-Battelle, LLC.

http://www.csm.ornl.gov/cs/cumulvs.html
Last modified: June 13, 2000 by Kohl.