Global Systems Simulation Software Requirements
(Authors: John Drake(editor), Ian Foster, Bob Malone, Dean Williams, Dave Bader)
The purpose of this document is to outline the general nature of the software requirements of the Global Systems Simulation Program, a major component of ACPI. Because much of the software lies in the realm of computer science, the Computer Science and Enabling Technologies (CSET) Program could make valuable contributions. By coordinating the activities within ACPI and CSET, it will be possible to leverage talent and resources to better accomplish the goals of both programs. Implementation principles for the ACPI have been described in the ACPI Implementation Plan and we assume that both programs will involve interagency agreements and coordination with proposals competed openly. The immediate objective of this document is to provide a framework for at least one application area, climate modeling, so that CSET proposals that appear to have greater impact on applications will receive greater weighting.
In order to outline the CSET - ACPI relationship, we will assume that several CSET Centers will be established, each focused on a particular set of computer science research activities. These centers will be charged with developing and deploying the necessary software technologies to enable and promote use of high-end computational and data resources of the SSI and IT2 programs. The ACPI will leverage the CSET Centers products and expertise in two ways. First, performance metrics and goals will be defined jointly by CSET and ACPI teams that will guide CSET software development projects. Second, a "directed" program will be established within each center to provide expertise to ACPI project teams.
This directed component of each center will be administered at the center?s discretion but coordinated through the SSI management. Some projects may need to be jointly funded by CSET and ACPI to ensure close coordination. Where development and refinement of software tools is specific to global systems simulation needs, this will be the concern of the ACPI Model Consortia and the Regional Climate Prediction and Assessment Centers.
A final set of assumptions is that the CSET Centers will be competed independently of the ACPI Model Consortia and Regional Centers. As such, the CSET Centers will not have specific application emphases but be available as a cross-cutting resource to all SSI applications.
From the point of view articulated in the ACPI Conceptual Document and the ACPI Implementation Plan, the success of the ACPI will be measured by demonstrated improvements in our ability to make regionally useful forecasts of long-term climate change. Success of the CSET-ACPI interaction will be measured by
The ACPI is also seeking support from SSI multi-teraflop sites in providing cost effective, high-performance computing systems that will fully support climate simulations which may require thousands of processor hours for their completion, producing many terabytes of model output that need extensive analysis and inter-comparisons with other such simulations. The CSET-ACPI relationship should enhance and facilitate the development of new models and analysis tools utilizing SSI hardware to advance the ACPI goal of accelerating the development of climate simulation models over the next decade.
The definition of user requirements for Global Systems Simulation is predicated on the view of a close relationship between scientific progress and model development. This is described in the ACPI Hardware Requirements Document. The best use of computational resources is to accelerate and advance the development of climate models in the next 2-3 years.
The first priority of the computing environment is the application of the coupled models for ensemble calculations of the several climate change scenarios. To provide this use, production quality services and reliability must be available. It is suggested that CSET establish a close working relationship with the major computer vendors and become aware of the vendors plans for software development in the areas of parallel operating systems, parallel debuggers, language support and optimization, data base management with high performance storage and visualization. ACPI productivity would be seriously impacted if quality software in any of these areas were to lack support in the computing environment. It is important for ACPI that functionality not be lost as we build to terascale computing environments.
ACPI?s requirements pose challenges that can best be met through close collaboration between CSET and ACPI. We pose the following as challenges for CSET research directions:
ACPI activities can be grouped under four categories: model development, model application, regional climate prediction, and assessment studies; analysis is an integral part of all categories. The CSET activities can be grouped under 6 categories: math libraries, problem solving environments (PSE?s), distributed computing, data management, collaborative tools, and visualization. We anticipate that CSET activities in all areas will contribute to most ACPI areas. The following table summarizes the ACPI-CSET interactions, which will be discussed in more detail below.
Table 1. Cross-reference of ACPI-CSET Requirements
|Math libraries||Prob Solving
|Distributed computing||Data management||Collaborative tools||Visualization|
|Model development||Focus on high-end computing resources and methods||Focus on storage and movement of extremely large datasets and their analysis|
|Regional prediction||Focus on distributed computing and regional down-scaling tools||Focus on data management and ease of collaborative analysis and assessment|
The precise definition of the CSET Categories is not needed for purposes of this document, but comments on our understanding of these terms may help to clarify how the requirements are organized. A problem-solving environment is taken to mean the entire collection of software used by an ACPI researcher at a SSI computing site. This includes the operating system, the batch job scheduling system, debuggers, compilers, performance measurement and optimization tools, even editors, but also the more traditional customizable GUI interfaces to site functionality. Inter-site software falls under the category of distributed computing. Collaborative tools are software tools through which people communicate and simultaneously visualize data.
To be systematic in our presentation of requirements, the elements of Table 1 will be discussed. Rather than attempt to discuss a matrix of 4 ACPI categories by 6 CSET categories, with 24 "interaction" bins, Table 1 has been simplified to a 2x2 matrix that emphasizes the close relationship of fine-grained categories by lumping them into coarser granularity groups. Thus math libraries, PSEs and distributed computing all address different aspects of computational resources and methods, while data management, collaborative tools, and visualization address different aspects of making simulations results readily available and amenable to analysis by (possibly geographically separated) collaborators. Likewise, model development and application are closely intertwined with similar needs for high-end computing resources, while regional prediction and assessment emphasize distributed computing and availability of model results and observational data for analysis. The result is that the 2x2 matrix is "diagonally dominant" with the computationally oriented model development and application components of ACPI having the strongest interaction with the math libraries, PSE and distributed computing components of CSET. The regional modeling, analysis and assessment components of ACPI are expected to be more reliant on the data management, collaborative tools, and visualization components of CSET. The off-diagonal interactions are smaller but still significant.
The context for model development and application software requirements is given in the Hardware Requirements document. There a scientific road map for climate model development is given, with emphasis on utilizing tera-scale computing and providing high-end climate modeling results to the ongoing National Assessment and the IPCC Assessment of 2005. To achieve the goals of the ACPI, computational performance and efficiency must be aggressively improved. The through-put goals described in the Hardware Requirements document cannot be achieved solely on the basis of hardware improvements. Better optimization, highly efficient, scalable math libraries, and more refined parallel programming environments are required. A coordinated program for ACPI code improvements and CSET enabling technologies is needed to realize 30-50% of peak performance on SSI machines. Performance at this level is sufficient to meet the ACPI through-put goals.
The context for analysis of model results and regional climate prediction and assessment was not described in the hardware requirements document since they are not the primary drivers of the hardware requirements. The Regional Centers will be data intensive with an emphasis on network access to national archives of observational data and climate model results. The purpose of the Regional Centers is to provide an effective interface between the climate model development and application efforts and the assessment communities. This task will be accomplished primarily by the development and delivery of regionally down-scaled model output and value-added products that are needed for climate impact and assessment studies. Because the Regional Centers will be part of a larger national network, the emphasis on collaborative tools and distributed computing creates significant opportunities to leverage CSET activities. Working with CSET, the ACPI seeks to establish a national network of global and regional climate archives with collaborative tools to facilitate analysis and use of the data. We anticipate strong collaborations in distributed computing, data systems, visualization, problem solving environments, and collaborative tools for the analysis and assessment activities of the Regional Centers, whose function and goals are described in detail in the ACPI Whitepaper on RCPAC?s.
The approach we would encourage in all these areas is for CSET to focus on a common component architecture from which tools can be built, consistently maintained and improved over a long period of time. This is depicted in Figure 1 using, as an example, some of the tools already available to the climate community through the PCMDI at LLNL and at NCAR. Many other groups should be able to take advantage of the (open) distributed architecture emerging from the Next Generation Internet (NGI) developments and other innovative efforts dealing with Computational Grids.
Figure 1. Layers of a Distributed Computational Grid Architecture supporting Climate Analysis Tools
In the long term, significant advances in climate modeling can result from research interactions between ACPI and CSET. The modeling and simulation science expertise associated with the IT2 should be encouraged to tackle the "hard problems" in climate modeling. In the past, algorithmic improvements have delivered computational enhancements on a par with machine advances. A deeper understanding of geophysical flows and climate processes can lead to reformulation of climate models and open the way for application of different mathematical and numerical techniques. Often such improvements are motivated by interactions between physical scientists pursuing fundamental research on general circulation theory and applied mathematicians investigating new numerical methods. This is the work that logically precedes future software developments.
The following discussion of specific software requirements is based
on Table 1.
ACPI Areas: Model Development and Application
CSET Areas: Math Libraries, Problem Solving Environments, and Distributed Computing
Math Libraries: Optimized real FFT?s and a full complement of numerical analysis routines have proved important for rapid development and efficient execution and production use of climate models. Numerical libraries for linear algebra, special functions, interpolation on the sphere and a variety of transforms will aid in modular code development. Threaded parallel versions of some numerical libraries are available commercially while distributed memory versions of numerical libraries are almost non-existent. Notable exceptions are the linear algebra routines in ScaLAPACK, the iterative methods library AZTEC, and the domain-decomposition PDE solver PETSc, developed by the DOE2000 program. All of the existing software, together with much broader coverage of numerical routines that are suited to a distributed, shared memory computer should be developed by CSET in support of climate model development efforts. These numerical procedures are the building blocks of future climate models.
Several dynamical kernels in climate models can be expressed as stand-alone mathematical problems. In this class are Helmholtz problems, shallow water equations on the sphere, and the "dry" baroclinic dynamical cores. Test cases and benchmarking procedures for these kernels have been established in the modeling literature. Math libraries that include these high-level numerical procedures can be envisioned and should be encouraged as a means of bringing mathematical talent into the sphere of climate modeling concerns. We will not specify particular discretization techniques for the development of such kernels, but recent examples of interest include spectral elements, meshless methods, high-order compact schemes and wavelet-Galerkin methods.
Statistical methods: It is recognized that ensembles of simulations are needed to (1) assess the ability of the model to reproduce the natural variability of the climate system and (2) analyze the sensitivity of the model results to uncertainties in model parameters. Methods are needed that provide guidance on how to maximize the benefit with the fewest number of simulations.
Programming Models: The programming model will be distributed, shared memory (DSM), to take advantage of shared memory features on a cluster and direct control of message passing between clusters. The standard OpenMP and MPI software with appropriate interfaces to compilers and execution environments are required. Other message passing libraries, which allow optimized performance on specific hardware, are also encouraged. Optimized collective operations in MPI and distributed memory data transposition routines should also be considered part of the basic support of the programming environment.
Since the programming paradigm must be flexible enough to express many levels of parallelism and memory hierarchies, we also support the use of source translation and preprocessing tools which allow code transformations which more specifically target machine or compiler optimizations from a single source version. These tools have proven extremely useful to aid code maintenance when CASE tools or compiler technologies are lagging behind the hardware.
Model Coupling Frameworks: It is the nature of our future model building activities that several component models will be assembled to function simultaneously. The functions of synchronization and standard data flow interfaces can be supported in libraries external to the scientific model development. An example of such software is the NCAR Model Coupling Library (MCL) which abstracts communication between atmospheric, ocean, land surface and ice models in a parallel computing environment. This type of functionality, if abstracted properly, could increase the productivity of ACPI model developers. An API for checkpoint/restart with supporting system software would be useful. A high performance parallel I/O system with a "open standard" interface is needed so that the I/O functions common to all components of the climate model could be expressed on a higher level.
Parallel Debuggers: Parallel debuggers for large distributed, shared memory codes under development are important to productivity of ACPI model development teams. A preference for the tool TotalView is expressed in the Chemistry community, but other vendor-supplied tools with equal or better functionality may be substituted. Performance monitors and profiling tools should also be part of the debugging and optimization tool set. Also integrated with the performance analysis tools should be load balance metrics with the ability to help the programmer discern which processors are idle and why.
Other Tools: Real-time system monitors are important to check the progress of long running jobs and to diagnose performance bottlenecks during execution. These tools are currently not readily available.
Operating System Support: Robust and stable system software is required for long-running jobs that create voluminous output. High performance, optimizing compilers for Fortran, in its latest incarnations F90 and F95, are also required to achieve efficient execution. The usual cadre of network interface capabilities as well as high-speed links to data storage archives is a requirement that must be worked out judiciously by the SSI computing centers.
Batch System: This is an important feature of the computing environment that will allow assignment of priority to the production runs and manage resources to support long running jobs. Specific features, required by the SSI Computing Centers user model and policy considerations, should be made part of the machine procurement. A network accessible batch system that provides easy submission of jobs from remote sites with a uniform interface across the SSI hardware is desirable.
Parallel I/O: The performance of the operating system and associated hardware interfaces for parallel I/O is often overlooked in specification of requirements. This is an acknowledged problem with ASCI level computations. Parallel file systems without superficial restrictions on file sizes or glaring inefficiencies in file management are urgently needed. The operating system support for I/O can be a limiting factor in throughput. The tests described in the hardware requirements include the regular I/O associated with production runs and will test the I/O capabilities of the OS. Deployment of MPI-IO, parallel netCDF, and vendor-supplied I/O systems should all be pursued.
Other Libraries: The NetCDF I/O library is increasingly necessary
for exchange of climate and weather data. Several models use this self-describing
format for input of initial conditions. Analysis tools also assume this
or other standard formats. Provision for high-performance I/O using the
NetCDF format will be a requirement of either the vendor or the computing
center. While we have picked NetCDF as the format of choice for ACPI, it
is clear that other formats, notably GRIB, will continue to be significant.
The development of a common meta-data format is needed to allow multiple
formats while still providing a uniform interface for applications and
CSET Areas: Data Management, Collaborative Tools and Visualization
As we enter the era of teraflop computing systems, our ability to generate model output is in danger of outpacing our ability to archive it and to transport it from site to site. As an example, running a high-resolution ocean model on present-day computers with peak speeds in the 100-gigaflop range can generate a dozen multi-gigabyte files in a few hours at an average rate of about 2 MB/second. Computing a century of simulated time would take more than a month to complete and would produce about 10 TB of output to archive. Application of climate models requires ensembles of many realizations to bound the uncertainty associated with natural climate variability. Archival systems capable of storing hundreds of terabytes are required to support calculations of this scale. On a one teraflop system each of the figures above could be increased ten-fold, making petabyte archives essential. Beyond that ACPI is targeting a 5-teraflop machine for FY00 and a 40-teraflop machine in FY03. There is, therefore, some urgency in being able to store and access petabytes of data.
Data Movement: Given this magnitude of model output, CSET and ACPI must address how climate results can be made available to the research and climate-impacts communities? Four modes of distribution can be envisioned that place very different demands on resources at the server site and on the network:
Archive Access: High speed access to the archive will be required for the model developers and model application teams. Since archives of model data will exist at NSF centers as well as DOE, NASA and NOAA centers, the ability to interchange and migrate data among the archives should be developed. This will require that a national infrastructure be developed, with an accepted meta-data convention that allows data to be identified independently of its particular format, file names and physical location. Authentication services for distributed archives must therefore be established for ACPI and CSET archives.
Data Translation: Tools to translate binary model output to netCDF, GRIB, LATS, HDF and other common formats should be available on all SSI platforms. Efficient access to data is often limited by the translation step. Due to the size of the datasets anticipated by the ACPI, multi-resolution storage and indexing may be required. Operations and data manipulations within these formats must also be supported. The NCO, netCDF Operator library is an example of a useful development in this direction.
Data Analysis: The Analysis requirements for Model Development and Model Applications are primarily for high end computing environments. After model output is archived, it exists as thousands of files recording monthly averaged values of several hundred atmosphere, ocean, land and ice fields. In some cases daily min and max values for selected meteorological variables will also be archived. Most of these fields are uninteresting to anyone but the model developers. They will, however, need to post-process all of the files to analyze and inter-compare climate diagnostics. The procedures carried out by PCMDI are instructive. A first look, "sanity check" is followed by standard model diagnostics and then studies of particular modeling issues. To inter-compare models or different versions of the same model, standard formats and meta-data conventions must be utilized. The Atmospheric Model Inter-comparison Project (AMIP) and Coupled Model Inter-comparison Project (CMIP) have defined some of the high level abstractions and tools. CSET projects should support and extend the hard earned results of the climate community to establish suitable conventions and procedures.
Collaborative Tools: Tools that integrate data access with geographical display and "field calculators" are needed for viewing and discussing model results. Several researchers at remote sites will need to interact through a common interface that supports joint display and manipulation of data. To support the long-term development of such powerful interactive tools, a common component architecture of building blocks and software layers must be established. This appears to be the only way to continue functionality past the time horizons of specific projects and vendor support.
Visualization: Since many, application-specific display tools
are already in use, and because climate diagnostics emphasize the standard
2-d contour plot with continents, the model development and application
teams do not require research in this CSET area. But innovative analysis
and techniques to project multivariable time series are of interest. Of
the existing tools for analysis of climate model output and processing
for graphical presentation, the PCMDI VCS tool and NCAR CSM postprocessors
must be available to model development and application teams. Commercial
packages that are currently used for visualization and graphics include
AVS, IDL and NCAR Graphics.
ACPI Areas: Regional Centers and Assessment
In contrast to the small number of users involved in model development and model application, the Regional Centers will be the focal point for a large, diverse group of geographically distributed users. The computational tasks will be more data intensive and less compute intensive than the model development projects. Emphasis is therefore placed on the second diagonal block of Table 1. Strong CSET/ACPI interactions are needed emphasizing easy access to distributed archives of regionally downscaled climate projections and assessment information. The development and deployment of tools supporting collaborative work is essential to the success of the ACPI Regional Centers.
The research staff at the Regional Centers will perform two basic functions: infrastructure support for the user community and regional climate research. The infrastructure support includes first look analysis of regional data, quality control, evaluation against observations, downscaling of GCM ensemble runs for climate change scenarios and tool development. The second function will be to participate in research that advances assessment science, regional modeling capabilities or the understanding of processes that affect regional climate. Investigation into the regional hydrologic cycle, biogeochemical cycles and carbon budgets as well as other energy related and environmental concerns, are examples of the enterprise that must be fostered by the Regional Centers.
CSET Areas: Math Libraries, Problem Solving Environments and Distributed Computing
Statistical methods: Statistical analysis software is needed by regional climate modelers and impact modelers. Statistical analysis in this area is often referred to as geostatistics. Libraries which offer empirical orthogonal function (EOF) analysis, non-linear normal mode analysis, multi-resolution spatial and time series analysis must be provided with effective interfaces to large datasets. In many cases, this will imply that the statistical analysis procedures must be parallelized. Currently, the desktop statistical analysis solutions like Splus are not adapted to the very high-end data needs that we anticipate for the ACPI.
Math Libraries: The optimized math libraries required by model developers are also needed by regional climate modelers. Impact analysis will require state of the art optimization methods to link assessment models with mitigation and adaptation strategies. Adjoint methods and inverse methods will also find application for extracting hard to measure fields in conjunction with physical modeling results and observational data.
Adaptive Mesh Methods: The current mesoscale modeling approaches that are used for modeling extreme events and regional climate models are based on nested meshes of increasing resolution over the area of interest. Simulation of a non-hydrostatic atmosphere using more general meshs is of interest for future modeling approaches. Linear algebra for the iterative solution of elliptic equations arising from these more general models will also be needed.
Distributed Computational Resources: The distributed nature of the Regional Climate Centers can be minimized by a coordinated approach to problem solving environments and distributed computing. Software projects such as GLOBUS, which provide a linked grid of computational and data server resources, are of interest for the Regional Centers. High level interfaces that are friendly for assessment and staff users can be built on top of the distributed computational infrastructure taking advantage of modern approaches to web based tools. Exporting analysis capabilities will also be important so that large quantities of data will not have to be moved. Export could be provided in a client-server mode for remote researchers and users of the Regional Centers. An example of this kind of service is the NetSOLVE tool interfacing numerical libraries with networked computing systems. CSET, in conjunction with the ACPI, should take the leadership role to deploy proven distributed computing technologies in the service of the Regional Centers.
PSE: A problem solving environment that offers easy access to regionally downscaled climate data and that has the ability to produce downscaled fields from GCM output using tools developed by the Regional Centers will be a goal of the ACPI Regional Centers program. The CSET common component architecture will provide the building blocks and basic infrastructure on which problem solving environments can be built. But the Regional Centers PSE must have the ability to see and access downscaled data from a number of centers. Thus it will be a distributed tool with an interface to a geographically distributed database of products. An ACPI/CSET interaction team needs to be defined to coordinate the development of PSE?s.
CSET Areas: Data Management, Collaborative Tools and Visualization
Since data management and collaborative analysis are central to the Regional Centers mission, the CSET/ACPI interaction should be focused on this area. Data management at the regional centers will entail the assimilation of portions of the primary global ACPI simulations. Data will be stored in a standard self-describing format allowing access to specific temporal or spatial slabs. A meta-data framework needs to be established to allow higher level, user-friendly access to the range of products produced by the Regional Centers and their collaborators.
Meta-Data Framework: Locating and tracking regional and global climate results will be a task of the Regional Centers. A meta-data framework to aid in this task is a clearly identified need. PCMDI, along with many other climatic data institutions, is engaged in the ongoing discussion of meta-data standards. The CSET teams in Data Management should join discussions of the evolving standards. Tools to support data retrieval and storage using meta-data descriptions with a nationally distributed archive are urgently needed. Data translation tools with some form of semantic interface would enhance the sharing of data produced by the Regional Climate Centers and their customers. Support for all the standard data formats netCDF, GRIB, HDF, etc., and translation between these, should be provided. The CSET activities in collaborative tools and distributed computing could play an important role in providing a solution to the meta-data problem.
Collaborative Analysis: For the inter-comparison of regional climate models and regionally downscaled results, collaborative viewing tools with automatic data translation and interfaces to Geographic Information Systems are required. The GIS has become a standard for viewing and calculating with geographic fields so collaborative tools need to take seriously the interface with GIS systems. Viewing of contoured fields with regional topography together with global display of major synoptic scale motions will be needed. This suggests that collaborative tools be designed with emphasis on interoperability.
Collaborative tools will be a hallmark of the Regional Centers user interface. As focal points for the analysis and scientific understanding of climate change impacts on a region, the user interface must support and encourage the exchange of ideas between a diverse set of users. We need novel ways to present information, to index information and to build a knowledge base accessible to scientists, modelers, impact specialists and policy makers.
Visualization: A variety of visualization tools are needed by the Regional Centers. The standard packages like IDL and AVS are heavily used in the climate community. But custom systems like the NCAR Graphics System and VCS and VisAD will also be needed. Some high-end graphics using immersive environments like CAVE5D will be needed. The ability to display a variety of meteorological and climate data on realistic terrain will be required.
Software Deployment Priorities
There is a question of emphasis when it comes to which software projects should be deployed. Once research and proof of principle demonstrations are completed there is significant effort involved in rewriting and hardening the software for production use. ACPI users require well tested, robust systems and software to carry out their jobs. Tests at multiple sites are usually required and user community feedback is important before any software is released. Creating good documentation and training material is also part of the deployment stage. It is our view that of the many software projects that will be undertaken by CSET, only a few will be deployed. The hardening and deployment activities of CSET will require a significant portion of SSI resources. Where the ACPI recommends focusing deployment activities is expressed in Table 2 below.
Some areas are marked U for urgent. In these areas there is a significant
lack of software that meets our needs. In some cases this is the result
of the turmoil in the high performance computing industry, where vendors
have dropped support for critical software components. We can no longer
assume that there is a core set of software products that vendors will
supply to the scientific community. Therefore CSET deployment activities
should be coordinated with the vendors, utilizing their expertise or other
commercially available software houses to produce the required systems.
Table 2. ACPI software deployment priorities