CUMULVS Team Photo

Computational Steering, Interactive Visualization
and Fault Tolerance in Distributed Applications


Summary

CUMULVS (formerly called StovePipe) is a valuable new tool for use in many large scientific applications because it allows scientists at different locations to visually monitor and remotely steer a large distributed application.

In light of the growing emphasis towards computer simulation and prototyping, CUMULVS could impact the entire scientific community and will certainly facilitate better collaborations among geographically distributed laboratories and organizations.

CUMULVS provides several important features for the computational scientist. It handles the details of collecting and sending distributed data fields to and receiving steering parameters from multiple dynamically attached viewers. These viewers can be commercial packages such as AVS or customized viewers for a specific application.

CUMULVS ensures that each viewer has a time-coherent view of the parallel data, and it ensures steering parameter coherency across multiple viewers. It manages all aspects of the dynamic attachment and detachment of multiple viewers to a running simulation. To use this new technology, existing programs require only slight modifications to describe how particular data fields have been decomposed and which parameters are modifiable by a viewer.


More Details...

Developed by ORNL researchers Philip Papadopoulos and James Kohl, CUMULVS contains a library for scientific applications that provides computational steering control, as well as visual feedback that a scientist can intuitively analyze. The library consists of approximately 20,000 lines of C code, and can be integrated into applications written in either C or Fortran. CUMULVS requires minimal modification of the user application to specify the nature and decomposition of the data fields to be visualized and to define steering parameters. This typically amounts to no more than 20-30 additional lines of code, even for complicated data decompositions. CUMULVS applications need not always be connected to a given viewer, and multiple viewers can interactively be attached / detached as needed. This proves especially useful for long-running applications that may not require constant monitoring. Though CUMULVS's primary purpose is manipulating and collecting data from distributed or parallel applications, it is also useful with serial applications for the purpose of transferring data from the computation engine over a network to a visualization front end.

CUMULVS can be utilized on top of any complete message-passing communication system, and with any front end visualization system. Current applications use PVM as a message-passing substrate, and a variety of visualization systems are supported, including AVS and Tcl/Tk. Porting CUMULVS to a new message-passing system requires only creation of a single declaration file to define proper calling sequences for CUMULVS communication. A substantial viewer library is also provided to enable efficient development of new front-end viewer programs.

CUMULVS automatically handles the collection of sub-region data from the application. For a distributed or parallel application, each concurrent task identifies its position in a specific data field decomposition, and CUMULVS then applies this information to determine precisely which data elements are present in each task. In addition to the sub-region boundaries, a sub-region request also includes a "cell size" for each axis of the data. The cell size determines the stride of elements to be collected along that axis, e.g. a cell size of 2 will obtain every other data element. This feature allows more efficient high-level overviews of larger regions by using only a sampling of the data points, while still providing the necessary details for smaller regions where every data point is required.

Once CUMULVS has collected the local task's data for a given sub-region, the data is sent to the viewer task where it is automatically assembled into a coherent "data frame" for animation. This data frame represents a uniform region of the global data array, using global coordinates, even if the actual array is decomposed across many parallel tasks. The frequency of data frames can be set interactively from the viewer, so the user can adjust how often frames are sent from the application, thereby reducing overhead effects.

CUMULVS supports coordinated computational steering of applications by multiple collaborators. A locking scheme prevents conflicting adjustments to the same steering parameter by different users. Consistency protocols are used to verify that all tasks in a distributed application apply the steering changes in unison. Scientists, even if geographically separated, can work together to direct the progress of a computation without concern for the consistency of steering parameters among distributed tasks.

The current CUMULVS system evolved from an earlier prototype system called PVMAVS (created by the authors) that linked a PVM application to AVS for floating point data visualization and simple steering operations. (This original work was presented at the High Performance Computing Symposium '95, in Montreal, Canada, July 10-12.) CUMULVS completely generalizes this early system, and now supports all standard data types (with built-in type conversion as desired), plus a significant variety of contiguous data decompositions, as well as particle-based data structures. CUMULVS also provides a fault-tolerant communication protocol so that failures in either an application or a viewer can be gracefully handled.

While on the surface the concept of collecting data from an application, or of passing steering parameters to an application, may seem rather straightforward, there are many underlying issues that make such a system difficult to construct. Creating CUMULVS in its current form required the development of a variety of synchronization protocols to maintain consistency among the many distributed application tasks without introducing any deadlock conditions. These protocols had to be dynamic to allow viewers to attach at will, and yet had to be tolerant of faults and failures. Efficient, general algorithms had to be formulated for the packing and unpacking of data in different data decompositions - obtaining every "Nth" element within a sub-region becomes significantly more complicated when working with arbitrary mixtures of block and cyclic decompositions. The viewer/application interfaces also had to be generalized to support a variety of viewers with different data and synchronization requirements. The end result is a general system that automatically and efficiently handles all of these challenging details, with a minimal amount of user specification or effort.

How does it work? Here are slides from a talk by Jim Kohl at the 1996 PVM Users Group Meeting describing implementation details.


Back to the CUMULVS Home Page.

For more details on CUMULVS or questions,
email: cumulvs@msr.csm.ornl.gov

Distributed Computing Group
Computer Science and Mathematics Division
Oak Ridge National Laboratory
P.O. Box 2008, Bldg 6012, MS 6367
Oak Ridge, TN 37831-6367

Research supported by the Mathematics, Information and Computational Sciences Office, Office of Advanced Scientific Computing Research, Office of Energy Research, U. S. Department of Energy, under contract No. DE-AC05-96OR22464 with Lockheed Martin Energy Research Corporation.

http://www.csm.ornl.gov/cs/cumulvs2.html
Last modified: May 10, 1999 by Kohl.