Several ACTS tools from the DOE laboratories have already been integrated and made interoperable to share portions of their functionality. A more general integration solution is being designed via the Common Component Architecture (CCA) for specifying how high performance software components interface and attach to each other across DOE (see below).
CUMULVS Approach
FY 1998 Accomplishments
Future Plans
Downloading and Using CUMULVS
Development Team Participants
The architecture of CUMULVS is described in the figure below:
CUMULVS is the infrastructure "glue" that allows independent
front-end viewer programs to attach to the tasks that make up
a distributed application.
The application tasks can be running on any combination of
Unix and Windows machines,
and the viewer programs can run locally with the application
tasks or remotely over a network.
There can be any number of different viewers attached at any one time,
each potentially viewing its own sub-region of a different data array.
The viewer programs can use any graphical system for rendering
data field views,
including
AVS,
Tcl/Tk,
a virtual reality interface,
or some application-specific custom GUI constructed by the user.
The CUMULVS system exists in three distinct pieces,
as an application-side library, a viewer-side library,
and a separate fault recovery daemon (not shown).
By compiling the appropriate CUMULVS libraries
into the application tasks and viewer programs,
CUMULVS can transparently attach and detach viewers
for run-time visualization and computational steering.
Once the application program has been instrumented
to identify and describe any data fields or parameters of interest,
the CUMULVS library can automatically extract
any desired data as needed.
The application-side and the viewer-side libraries
communicate by invoking the necessary protocols
to pass information back and forth for viewing or steering,
under the control of the simple, high-level library interface
at the viewer.
Another way to consider the CUMULVS viewer interaction is in terms
of a distributed data array, as shown in the above example.
A very large array can be decomposed
among some number of application tasks.
CUMULVS understands these decompositions,
and so each viewer program can extract its desired array elements
from the tasks where those elements are stored.
The viewers provide access to the data array in global coordinates,
as if it were a single, contiguous array.
Multiple, remote collaborators can all attach to the same application and coordinate its control cooperatively. Each scientist can simultaneously "steer" a different parameter. These parameters can be scientific in nature, controlling physical aspects of the experiment, or can be algorithmic parameters that control the numerics of the computation itself. Such interactive control over an application can be used to explore "What If?" analyses, even for non-physical effects that could not be duplicated in a physical experiment. In addition, steering can be used to keep a simulation on track by making minor adjustments while it is running. This type of control also produces a more efficient experimentation cycle by cropping off experiments gone awry. This is especially useful for long-running software experiments.
As a means for application-directed fault tolerance, CUMULVS provides a heterogeneous checkpointing facility. Using the same infrastructure that allows an application to declare data arrays for visualization and parameters for steering, the application can mark the variables that contribute to the minimal program state. CUMULVS can then automatically extract these variables and save them in a checkpoint file for restart or migration. CUMULVS handles the collection and organization of the checkpoint data, and provides an automatic run-time fault recovery system.
Because the application describes to CUMULVS (in the data array instrumentation) what the semantics are regarding the decomposition of it data arrays, CUMULVS can provide an efficient and flexible checkpointing mechanism. Rather than saving a full core image of the application, CUMULVS can save just the necessary variables, resulting in significantly smaller checkpoint files. The semantic information also allows CUMULVS to manipulate the checkpoint data for heterogeneous restart and migration, and can even reconfigure a checkpoint for restart using a different machine topology or decomposition type.
The CUMULVS run-time fault recovery system consists of one checkpointing daemon (CPD) per host, for collecting and providing checkpoints, and monitoring the application and system for failures. The CPDs also act as a "console" for manually controlling any restarts or migrations as desired. The CPDs themselves comprise a fault-tolerant application that can handle the failure of any subset of CPDs or hosts. The CPDs coordinate the redundancy of the checkpoint data among the different hosts, using a "ring" topology.
- CUMULVS integrated with MPI to provide visualization, computational steering and fault tolerance for MPI applications.
- CUMULVS Checkpointing system prototype completed and released to the public as of July 1998.
- CUMULVS Particle-Based Viewer Programs developed as an extension to the contiguous data array viewers, to provide visualization support for particle-based applications such as Smooth Particle Hydrodynamics (SPH).
- Integrated CUMULVS with NERSC's Cray T3E to allow remote monitoring and visualization of T3E applications over a standard network.
- Collaborated with NCSA on the construction of collaborative, 3-D and Virtual Reality viewers for CUMULVS.
- Common Component Architecture (CCA) Forum
was formed, with ORNL as a founding member.




To support visualization of particle-based applications, as supported
now via the generalized particle decomposition interface co-developed
between ORNL
and Sandia,
several CUMULVS viewers have been extended to handle particle data.
The AVS
viewer module and
Tcl/Tk
slicer viewer now extract and display particle data,
even in conjunction with other contiguous data fields.



- CUMULVS & AMR:
Extend the CUMULVS Application Programming Interface (API)
to include support for dynamic data redistribution
as found in Adaptive Mesh Refinement (AMR) algorithms.
- CUMULVS Messaging Substrates:
Extend CUMULVS to communicate over alternate
internal message-passing substrates
(as used for attaching to application tasks),
including Nexus messaging and possibly MPI.
- CUMULVS Checkpointing Optimization:
Extend and optimize the CUMULVS checkpointing system
to support the parallel writing of checkpoint data
and to provide adjustable levels of redundancy / fault tolerance.
- CUMULVS & CCA:
Bring CUMULVS into compliance
with the Common Component Architecture (CCA) specification,
allowing CUMULVS to interoperate with other tools,
such as Harness, PAWS, and InDEPS.
- CUMULVS Instrumentation:
Create a Problem Solving Environment (PSE) and pre-processor
to assist in instrumenting applications
for computational steering, interactive visualization
and fault tolerance.
- Model Coupling Using CUMULVS:
The CUMULVS system is being applied to experiment with the
coupling of data fields across disparate models,
as used in hybrid or cooperative simulations.




Next, a more fundamental step towards interoperability will be
taken to bring CUMULVS into compliance
with the new Common Component Architecture
(CCA)
specification. The CCA will provide an interface
for dynamically attaching to an application to share data and cooperate.
Tools such as CUMULVS will use the CCA as a foundation for attaching to
generic CCA-compliant applications for visualization and steering. The
data decomposition interface that CUMULVS uses to extract data fields
will be extended to accept CCA-compatible information from applications,
to identify what data is available and how it is distributed among
tasks. This will effectively make CUMULVS interoperable with any other
potentially CCA-compliant tools, such as
PAWS
and InDEPS (POET).
The CCA also provides a more general approach to plugging together
components in the
Harness
environment, a project underway at ORNL,
the University of Tennessee
and Emory University.
The CUMULVS capabilities will integrate more easily into Harness
using the standardized specification layer provided by the CCA.


As part of a national climate modeling effort,
CUMULVS is being applied to experiment with the coupling
of models.
Two different models, e.g. an Ocean model and an Atmospheric model,
might be coupled together to share and correlate a common data field,
such as Temperature.
This typically requires that CUMULVS extract the data
from one model's decomposition and redistribute it
according to the other model's task topology.
As seen in the figure below,
this is a natural extension to the fundamental CUMULVS viewer protocols,
if the one-to-many attachment is enhanced to many-to-many.
Additionally, however, many problems remain to be solved,
such as generic approaches to interpolating data in time and space.
This arises when two models do not share the same spatial grid,
or if they compute on dramatically different simulation time scales.
The latest version of the CUMULVS software can be downloaded from the CUMULVS Home Page. This software works on most Unix platforms, and there is also a preliminary distribution for Windows platforms.
Currently, CUMULVS is built on top of PVM, so you must download and install PVM to use it. CUMULVS works with either the PVM 3.3.11 or PVM 3.4.* releases.
The CUMULVS software comes with several pre-defined viewer programs, including an AVS viewer, a Tcl/Tk slicer, and a simple text-based viewer. To use the AVS viewer, you must first purchase AVS for your system. To use the Tcl/Tk viewer, you must download and install the Tcl/Tk system. CUMULVS will work with versions Tcl 7.4 / Tk 4.0 or later. Additional information on Tcl/Tk can be found at http://www.tcltk.com/.
Jim Kohl (ORNL)
Al Geist (ORNL)
Conrad
Albrecht-Buehler (ORNL)
Dave
Semeraro (NCSA)
Phil
Papadopoulos (UCSD)
Rob
Armstrong (Sandia)