Global Systems Simulation Hardware Requirements

(Contributors: John Drake(editor), Bill Dannevik, Ian Foster, Jim Hack, Steve Hammond, Bob Malone, Doug Rotman, Dave Bader. Please send comments to drakejb@ornl.gov)

Understanding and predicting climate, particularly its natural variability and possible human-induced climate change on the decadal to century time scale, presents one of the most difficult and urgent challenges in science. This is because changes in climate, whether anthropogenic or natural, involve a complex interplay of physical, chemical, and biological processes of the atmosphere, oceans, and land surface. As research seeks to explain the behavior of the climate system, focus necessarily turns to behavior introduced by these processes and their interactions among climate subsystems. The simulation of global systems is a challenging scientific and technical problem with broad implications for energy and environmental policy.

Policy makers, economists, agricultural experts, and other planners need reliable, quantitative information with regional detail about projected changes in climate. Growing national and international concern about changes attributable to human activities have increased the urgency of this need. The best tools we have for integrating our knowledge about the climate system and for predicting future changes are global climate models that are based on interacting sub-models of the atmosphere, ocean, sea ice and land surface. The enhancement and application of these coupled climate models to accurately project climate change scenarios is the primary goal of the Accelerated Climate Prediction Initiative (ACPI).

In the context of such models, we can interpret the three key words above as follows:

However, a major, limiting factor to advancing the state-of-the-art in climate change research has been the lack of dedicated computing cycles on high performance machines to run tightly coupled atmospheric and ocean models simultaneously. Only by substantially augmenting the speed with which these models execute on a given computing system, can we hope to enhance our understanding of the components that make up our complex global climate system.

Climate modeling is unique among the Grand Challenge problems in terms of the length of simulations and volume of model output produced. The reason for this is physical in origin: the global climate-change "signal" or "signature" must be discernable above the "noise" of the natural variability of the climate system. Variability in the atmosphere alone has fairly short-periods: weeks to months. However, the ocean is quite different. Paleoclimate data, collected from Greenland and Antarctic ice cores, tell us that fluctuations in climate have occurred on time-scales of tens to hundreds of years. These are generally attributed to shifts in the ocean circulation, especially the Gulf Stream. The ocean also has modes of variability with periods extending to hundreds of years or longer, so long, in fact, that they have only been explored with rather crude models. As a result, very long-duration runs accompanied by frequent output of large files are needed to analyze the simulated climate variability on all relevant time scales.

This document describes the plans of the DOE/ACPI program for the near term development of coupled global circulation models and their use in simulating climate change scenarios. The expected scientific development is used to underpin the specification of computational resources and hardware requirements for a 5TF machine to be procured by the SSI. The goals of the ACPI program, to be supported by the computing platform, include support for extensive model development and the provision of dedicated computing resources for production runs of various climate change scenarios.

What approach to model development will lead most quickly to attainment of the ambitious goals outlined by the ACPI Program? An iterative approach is called for, in which resolution, completeness of the model, and complexity of process parameterizations are all increased progressively in proportion to the expansion of computer power and resources. This provides the opportunity to examine the rate of convergence of the solution with increasing resolution and the cost/benefit ratio of increased resolution.

Model development should proceed continuously. We envision several tasks in the production of new coupled climate models. At the "top" level is the task of applying and analyzing the current generation of the coupled climate models and using the results for assessment of various scenarios. We will call the version of the code being used in this task version 0. A second activity is engaged in building the next-generation coupled model from the most recent components received from component development teams. The integration task improves the coupling scheme, couples the components, tests and validates the new coupled model, termed version 1.

Component development teams are developing, testing and optimizing the next generation of component models, which we will refer to collectively as the version 2 model. These models have higher resolution, better parameterizations appropriate to that resolution, improved numerical algorithms, and more efficient parallelization than either the version 1 or version 0 models. Model validation through comparison with observations is essential for the version 2 component models and for the version 1 coupled model before it is promoted to become the next production model.

We expect these tasks to be accomplished in a framework of multi-institutional cooperation, especially at the component model development and validation level, where there can be many teams. Networking for fast, reliable access to computing resources is thus very important to the operation of the SSI Computing Centers. Small teams can work concurrently on parameterizations, numerical methods, and code optimization. Multi-institutional participation will be aided by Internet communications and by technological developments for collaborative analysis that will be fostered by this program and ASCI. Enhanced mechanisms for communication and sharing of results will be equally important in facilitating construction and testing of component models.

By maintaining concurrent efforts at all phases of model development, validation and application, it should be possible to shorten significantly the "cycle time" between generations of models. Another key requirement is the availability of adequate computer resources to complete model testing and validation runs of the version 1 and version 2 models with a turnaround time much less than the generation cycle time. If we target a cycle time of 2-3 years, the following goals are set for computational resources.

Details are provided in section 1.1.3.

In this section we discuss the scientific roadmap expected to be followed by the model development and integration teams that will be using the SSI multi-teraflop systems. This determines the computational demand and balance of codes in the computing environment. We will also use this discussion as background and motivation for the specific GSS requirements proposed in section 1.1.4, Computing System User Requirements.

 

Dynamical Cores

The numerical methods for dynamics of atmospheres, oceans and ice are a continuing research area. More accurate and efficient baroclinic models, in particular, can be developed independent of the full radiation and moist physics of a climate or weather model. It may be possible to interchange these dynamical cores between models developed in ACPI much as currently physical parameterizations are interchangable. Dynamical cores currently considered as possible candidates in a next generation atmospheric climate model include,

The horizontal resolution of these new schemes will vary between the current T42 (2.8 degree) and T426 which reaches the ACPI 30km grid resolution goals. A major increase in computational power is essential for the development of high resolution models. The vertical resolution will certainly increase to at least 30 levels in the next models. The extra resolution, contained near the tropopause and in the boundary layer, will prove beneficial for a variety of reasons.

The table shows the computational requirement (as a function of resolution) for an Eulerian spectral model of the atmosphere assuming no change in physical parameterizations and process models. The columns indicate the spectral truncation (Tm), the number of grid points in the longitude, latitude and

Tm I J K DT Resolution Gflops/day R(GF/sec) Storage(Gbytes)
42 128 64 18 1200 313 77 15 22
63 190 95 18 1200 211 169 32 48
85 256 128 25 600 157 743 141 120
106 320 160 25 600 125 1,162 221 188
170 512 256 32 300 78 7,019 1,334 613
213 640 320 48 300 63 17,312 3,291 1,430
341 1024 512 64 150 39 115,260 21,911 4,870
426 1280 640 64 120 31 224,702 42,717 7,609

vertical directions and the time step size in seconds. Resolution is in kilometers. The next column indicates the operation count for a simulated day. From this operation count we compute the required, sustained computational rate, R. The performance goal is 15 years in 8 hours wall clock time. The final column is the expected storage for monthly averages of a 15 year run. Other choices for the dynamical core may reduce the computational burden somewhat.

The next step for ocean models will be the development of a hybrid vertical coordinate system that provides fixed Eulerian levels in the upper ocean, transitioning smoothly to isopycnic coordinates in the density-stratified deep ocean. Such a scheme will provide good resolution near the surface where mixing is strong, and good conservation of water-mass properties where diapycnal mixing is weak. Isopycnic coordinates also provided a better basis for implementing a bottom boundary-layer scheme to represent flow over sills and down sloping topography. Because this scheme will provide the most appropriate representation in the upper and deep ocean, it will also be "efficient" in the sense that only modest increases in number of vertical levels will be needed as the horizontal resolution is refined. In particular, the effeciency of icopycnic coordinates in representing the deep ocean may permit shifting more levels into the upper ocean for better resolution of biological processes. 

Energy cycle

The improvement of the component models will not be limited to increases in resolution. Since the fluid dynamics is only one of several non-linear processes in each component model, we anticipate vigorous development of parameterizations and methods which correctly account for the energy, hydrological, salinity and chemical cycles within each component model.

The atmospheric model will very likely contain new longwave radiative transfer scheme, which provides longwave scattering capability and has attractive computational characteristics. Also some significant enhancements to the gravity wave drag scheme can be anticipated. Minimally, a new filter will be used to produce the surface geopotential with the objective of flattening the oceans without shearing off the tops of the mountains.

 Hydrologic/Salinity cycle

The atmospheric model will very likely contain a prognostic cloud water scheme, a new generalized cloud overlap scheme, and either a modified closure of the current convection scheme or an entirely new convective scheme. We also anticipate a modified boundary layer scheme which treats phase change within the boundary layer. Many of these improvements will result in an increase operation count for the atmospheric component.

Chemical cycles

Chemistry development will focus on the implementation of global chemical and physical processes required to represent distributions of non-CO2 chemical species relevant to radiative forcing and climate prediction. Most important of these species are ozone (both tropospheric and stratospheric), methane, CFCs, and aerosols; the important radicals and precursors to these species must also be included. Initial studies will focus on approximately 40 chemical species representing the global free troposphere and stratosphere that are involved in the important production and loss cycles for ozone, the methane-smog reactions and aerosol heterogeneous reactions.

Atmospheric models must include the radiative and dynamical processes of the stratosphere to properly represent the formation and loss of stratospheric ozone. Increased vertical resolution, especially in the tropopause region, will be needed to improve our modeling of ozone transport from the stratosphere into the troposphere as well as representing species having sharp vertical gradients, e.g., water vapor and nitric acid. Improved horizontal resolution allows detailed regional characteristics of species emissions (e.g., energy use emissions of sulfur, NOx, and CO) and their role in in-situ ozone production.

The addition of chemical processes will allow the assessment of influences and feed backs from geographic and temporal changes in chemical species distributions to the radiative balance of the atmosphere and the resulting climate. Aerosols (sulfate, sea-salt, carbonaceous, etc) play a key role in this radiative balance because of their ability to reflect incoming radiation (i.e., direct effects) and alter cloud characteristics and formation processes (i.e., indirect aerosol forcings).

Ocean and terrestrial biogeochemistry models are required in climate models to identify important sources and sinks of radiatively active gases (e.g., CO2, CH4, etc.) in order to investigate the relevant exchange processes between atmospheric, land and ocean ecosystems. Ocean biogeochemistry models typically predict the air-sea fluxes of chemical species, marine biological productivity, export of carbon from the surface to deep ocean, remineralization, dissolution of biogenic compounds in the deep ocean, and the production and dissolution of sediments using approximately 10 species important to the carbon cycle.

Increased vertical resolution in the upper ocean will be needed to adequately represent biological processes in the upper ocean. The absorption of solar radiation by the upper ocean occurs on a scale of meters where improved resolution has consequences for photosynthetic organisms and species transport processes. Increased horizontal resolution in the ocean will be needed to adequately represent eddy-induced vertical transports of nutrients, such as the influx of nutrients from the deep that occurs in the interior of cold-core eddies.

Terrestrial ecosystem models can work closely integrated with land surface models (such as SiB) to investigate impacts of climate and land-use changes on the fluxes of radiatively important gases and the health of the terrestrial biosphere. Such models can benefit greatly from increased model resolution, especially in areas with heterogeneous climate. Higher resolution would permit these climates to be independently represented, so that the specified vegetation could experience a more appropriate simulated climate.

Overall, as a result of improved representation of energy, hydrological and chemical cycles, we expect a large increase in the operation count. The addition of a 30 species tropospheric chemical model to a version-1 coupled model will increase the computation of the atmospheric component by a factor of 2.5. But as a result of better representation of the cycles, budgets important for climate prediction will become available to the research and assessment communities. Many researchers, excited about these possibilities, will be inclined to incorporate additional physical and chemical improvements rather than increase the horizontal resolution of the models. The development targets of ACPI will require research with the higher resolution models to demonstrate superior performance.

 

      Model Development and Application Roadmap

 

One of the primary ways in which the ACPI expects to "accelerate" climate prediction is by shortening significantly the "cycle time" between generations of models. This will be accomplished by maintaining concurrent efforts at all phases of model development, validation and application, as outlined in section 1.1.1. A key requirement to the success of the ACPI is the availability of adequate computer resources to support simultaneously all three developmental phases. Coupled model application (version 0), with multiple ensembles and scenarios, together with complete model testing and validation runs of the version 1 and version 2 models, with a turnaround time much less than the generation cycle time.

In the following table, specific examples are given of what version 0, 1 and 2 might be based on estimates of what improvements are expected to go into future models and what increase in computational performance (‘PF’ denotes this performance factor) each might entail, relative to the previous version.

Version 0 coupled models already exist and are running in production on parallel computers such as the SGI Origin 2000 and the Cray T3E. The Parallel Climate Model (PCM) and the Climate System Model (CSM) are both based on the NCAR CCM3 atmospheric model running at T42 (2.8º, 300 km) resolution with 18 vertical levels. The ocean component of CSM is the NCOM ocean model running at ~2º resolution with 45 vertical levels, while PCM uses the POP ocean model running at an average resolution of 2/3º with 32 vertical levels. Such coarse resolution is necessary for multiple century-long simulations to be feasible with present-day computing resources. For this discussion, the Version 0 coupled model is taken to be the Parallel Climate Model (PCM) developed by a multi-institutional team led by Warren Washington (NCAR) with support from the DOE Climate Change Prediction Program. PCM simulates 1 year in 5 hours on 64 195-MHz SGI Origin 2000 processors, corresponding to a peak rate of 25 GF. To integrate 150 years requires 750 hours or 1 month. PCM is currently limited to 64 processors, because it has only a one-dimensional domain decomposition in latitude. However, a two-dimensional decomposition has been completed that will allow PCM to run on a larger number of processors. Performance measurements indicate that the real sustained performance of PCM on 64 processors is about 10% of peak, or 2.5 GF. This value, 10%, is typical of the ratio of sustained-to-peak performance obtained with climate codes on other parallel machines with cache-based processors, so we assume that it applies to the other versions as well.

A very high priority is given to running ensembles of coupled model simulations to quantify the uncertainty arising from natural variability. Such ensembles can be run in parallel, increasing the required performance (and memory) of the computer system by a factor equal to the number of instances in the ensemble. Reasonable estimates of the required ensemble size for a statistically valid sample is order 10 (Zwiers, F. W., 1996: Interannual Variability and Predictability in an Ensemble of AMIP Climate Simulations Conducted with the CCC GCM2. Climate Dynamics, 12, 825-847.) The number of scenarios expected is of order ten. In reality, only the ensembles are likely to be run in parallel. The various scenarios are more likely to be run sequentially so that the researchers can learn from each scenario before proceeding with the next scenario. Dealing with output from a single simulation is a significant undertaking, so it is unlikely that more than 10 simulations would be undertaken in parallel. In the table, it is assumed that 10 instances of a single scenario will be run in parallel with version 0. This yields a total performance requirement of 1500 yr/week, which will require an aggregate peak performance of 1 TF.

The Version 1 model is the "next generation" coupled model being tested and validated to eventually become the next production version. The table reflects that fact that the prevalent view is that adding more processes (e.g., chemistry) and improved parameterizations of processes is more urgent than moving to higher resolution in the atmosphere. The two big items for the Version 1 atmospheric model are tropospheric chemistry (PF=2.5 relative to Version 0) and sulfate aerosol chemistry (PF=2). Adding a carbon cycle in the ocean, which includes nitrate and phosphate chemistry, is estimated to have a PF=2. This leaves room for an increase in oceanic horizontal resolution of a factor of 2 (PF=23 = 8). The number of vertical levels stays the same but become hybrid coordinates (PF=1.2). The performance goal is the same as for Version 0, 150 years/week, but for only a single instance. This rapid turnaround will be important when dealing with the increased complexity of this coupled model.

Version 2 refers to the follow-on generation of component models. Here most of the attention goes to increased spatial resolution, a factor of 2 in the horizontal and a factor of 1.5 in the vertical in both atmosphere and ocean. The factor of 2 becomes a factor of 8 because, in addition to quadrupling the number of horizontal grid points, the timestep is cut in half to maintain numerical stability. With some additional chemistry in both components, the net performance factor is 24 for each. If each component can be run 15 years in a week, then running both (uncoupled) requires them to share the resources (as they would if coupled) for an net rate of 7 years/week and a peak rate again of 2.5 TF.

The Exploratory Versions of the component models are intended to "explore" even higher resolution. They will be run intermittently for the purpose of examining scaling of process parameterizations with resolution and performance scaling.

The objective of ACPI is to "accelerate progress" in climate modeling by having sufficient computing resources to support Version 0, 1 and 2 model application and development concurrently. This means that the requirements for the three columns must be added together: approximately 6 TF. This is the requirement needed to pursue a single family of climate models, for example, the PCM or the Climate System Model (CSM) also based at NCAR. However, the ACPI draft implementation plan (Feb 1999) proposes to support more than one model consortium, so comparable resources would be needed for each consortium.

Clearly, there is an important trade-off between increases in model comprehensiveness and resolution. This is a subject of intense debate that can best be resolved by examining both approaches in parallel and seeing whether more is gained from higher resolution or enhanced treatment of model processes. Historically, improvements in solution algorithms, better formulations of model equations, and other manifestations of human innovation have helped moderate the demand for bigger computers, but the reduction factor is difficult to predict. Advances in computer architectures, networks, languages and compiler technology may also help increase the disappointing ratio of sustained-to-peak performance (~10%) typically attained on cache-based distributed shared-memory computers.

An argument can be made that a 5 TF system dedicated to ACPI climate research is required for significant progress in the next 2-3 years. There would also be significant advantages in providing development and analysis capabilities under local control of model developers. Several smaller systems augmenting a 5 TF system could be used to advantage by the ACPI. Such systems would also leverage the archiving capacity of host institutions. Delivery of model results from the production 5 TF platform to remote sites will require high bandwidth network connections between key sites.

Version 0

Version 1

Version 2

Exploratory

Coupled Model

Coupled Model

Component Models

Component Models

Atmosphere                        

PF

             

PF

             

PF

Horizontal resolution

T42, 300 km

T42, 300 km

1

T85, 150 km

8

T170, 75 km

8

Vertical levels

18

45

2.5

45

1

60

1.3

Added processes          

New LWR

1.2

Strat Chem

3

TBD

1

    

Prog CldW

1.3

                   

Sulfate Aeros

2

Troposph Chem

2.5

Net performance factor    

20

    

24

     

10

   
Ocean             

PF

    

PF

      

PF

Horizontal resolution

2/3º, 70 km

1/3º, 35 km

8

1/6º, 18 km

8

1/10º, 10 km

4.6

Vertical levels

32

32 (hybrid)

1.2

48

1.5

48

1

Added Processes        

Carbon cycle

2

Sulfur chemistry

2

TBD

1

Net performance factor     

19

    

24

   

5

       
Ensemble size

10 in parallel

       
Scenarios

10 sequentially

            
Performance goal

150yr/week/inst

150 yr/week

15 yr/week/comp

(expressed inversely)

1 hr/yr/inst

1 hr/yr

12 hrs/yr/comp

Instances per scenario

10

Scenarios

1

Total performance

1500 yr/week

150 yr/week

7 yr/wk (combined)

Sustained performance

10 GF/instance

250 GF

250 GF

100 GF total

Peak performance[1]

1 TF

2.5 TF

2.5 TF (combined)

[1] Assumes sustained/peak ratio of 10%

Computing System User Requirements

The ACPI is seeking support from SSI Multi-teraflop sites in providing cost effective, high-performance computing systems that will fully support climate simulations that may require thousands of processor hours for their completion, producing many terabytes of model output that need extensive analysis and inter-comparisons with other such simulations. Further, the SSI computing environment should enhance and facilitate the development of new models and analysis tools which will advance the ACPI goal of accelerating the development climate simulation models over the next decade.

The definition of user requirements for the Global Systems Simulation is predicated on the view of a close relationship between scientific progress and model development articulated above and the best use of computational resources to accelerate and advance the development of climate models in the next 2-3 years. The SSI computing environment must support the ensemble calculations using the version 0 coupled model, as well as provide turnaround for the development of the planned version 1 and 2 coupled models and their components.

The first priority of the computing environment is the application of the coupled models for ensemble calculations of the several climate change scenarios. To provide this use, the site(s) must provide production quality services and reliability. In the accompanying GSS Requirements Table, the coupled model column describes requirements for ensemble calculations. The other columns specify requirements for component model and process model development simulations. These simulations must be performed to advance the state of the art of new, coupled models.

Since version 1 and version 2 are not yet available for testing on a new system, currently available models will be used to judge the performance (and to define acceptance tests) for the 5TF system. To represent the desire to develop higher resolution component models on the procured 5TF computer, we have included an ocean and an atmospheric model run at moderately high resolution as part of the requirements. The ocean model development test uses the LANL POP model run at 1/10 degree with 40 levels. The atmospheric model development test uses the NCAR CCM3 run at T170 and 32 levels. These are considered representative of the class of atmospheric and ocean models that are of interest to ACPI model developers. A 40 species chemical model at low resolution represents the development requirement for chemistry. What follows is a discussion of the specific line items in the GSS Requirements Table and how the specific quantitative measure is derived.

Hardware Requirements

Through put

For production use of the coupled model, an ensemble of order 10 instances, each simulating 150 years, should turn around in approximately 1week. To evaluate a Version 0 we use the PCM code with T42 in the atmosphere and 2/3 degree in the ocean. The most effective use of the machine for ensemble calculations is to run several instances simultaneously, using several small machine partitions. The requirement is that the total of 1500 years of simulation be performed in less than 1 week.

The development of the high-resolution ocean component requires a 15 year simulation in 24 hours wall clock time. The development of the atmospheric component requires overnight turn around on a 15 year simulation. Our goal is 15 years simulation in 8 hours wall clock time.

A balanced, coupled model will use roughly equal computational resources in the ocean and the atmosphere. A simulation of 150 years of the coupled model is required to evaluate the component interactions. A reasonable amount of time for this simulation is 1 week wall clock time. If the atmosphere and ocean are run in a sequentially coupled mode, then this is consistent with the goal above. That is 8hr x 10 = 80 hrs for 150 years of atmosphere is 3 days to simulate 150 years. If the ocean takes a comparable amount of time the sum is 6 days in coupled mode. Thus the development requirement of the coupled model is satisfied if both the ocean and atmospheric component model requirements are met.

The development of the atmospheric chemistry model to be incorporated in future coupled models will require high vertical resolution. At T42 with 44 vertical levels, a 15 year simulation completing in 8 wall clock hours requires 1 Tflop/s sustained computational rate. This is based on the IMPACT model with a Gear integrator.

Mesoscale models will be used primarily for process modeling studies and for development of physical parameterizations. For example, to study cloud microphysics and the parameterization of cloud processes, MM5 can be run in forecast mode and compared with ARM data. The benchmark run proposed involves a 33 vertical level, 4-nested grid of 30-10-3-1 kilometers resolution at 100x100 grid for each nest. Cloud microphysics modeling is done with the Reisner package as in MM5V2.12. A four day forecast is generated on the fine 1km grid in less than 1 hour wall clock time.

Sustained Gflop/s

Because of the difference in resolution between the component models under development and the coupled model used for ensemble calculations, we propose evaluation based on two resolutions. In the atmosphere, the low-resolution model is T42 spectral while the high resolution is T170 spectral. The table included in the dynamical cores section shows the required sustained computational rate to achieve this throughput for a variety of CCM3 model resolutions.

The ocean model, low resolution is 2/3 degree with 32 vertical levels and the high resolution is 1/10 degree with 48 levels. The sustained Gflop/s rating is calculated based on these resolutions.

Chemistry CPUs times are scaled using known runs of the Gear solution based IMPACT model. For this model at 4 by 5 resolution, 44 levels and 46 species, C90 CPU requirements are 425 CPU hours per simulated year. Scaling these values to the Version 2 throughput, chemistry and resolution requires the model to run at 1 Tflop/s.

The four day forecast from the mesoscale model must be calculated in less than 1 hour. This will require a sustained computational rate of 100 Gflop/s.

The throughput, as a measure of performance, is more important than sustained Gflop/s. The codes suggested for benchmarking, evaluating and acceptance testing the 5TF machine, should be used to evaluate the sustained computational rate.

Local Memory

The climate codes are generally able to partition the data structures in their parallel implementations relieving a large local memory requirement. The 2 Flop/s to 1 byte memory ratio is a rule of thumb that has proved adequate in previous development work. For this specification we use peak Flop/s (somewhat inconsistently) to define the ratio. Thus a vendor-rated 1 Gflop/s processor should be equipped with 512 MB of local memory.

Aggregate Memory

The aggregate memory also follows the 2 to 1 rule. Thus a 5 TFlop system requires 2.5TBytes of memory.

Bandwidth and Latency

A model for the bandwidth requirements derived from empirical measurements of the CCM and POP codes is used to develop bandwidth and latency requirements. For the 2-D decomposition in CCM, the ratio of floating point operations to message (or bus) volume is approximately 8 flops /byte. A bandwidth limited computation results unless the cross-sectional bandwidth of the machine is greater than one eighth the desired sustained flop rate. For a one-dimensional decomposition of CCM, the ratio is higher and thus the bandwidth requirements may be lower.

The bandwidth requirement for the T170 atmospheric component is the sustained Gflop/s requirement divided by eight. The node to node bandwidth requirement for T170 is also derived from empirical studies and is related to the latency requirement.

I/O Bandwidth

The I/O bandwidth requirement for running jobs is not extremely high. This requirement is instead based on the more stringent system function of checkpoint and restart. This facility is very important for long running climate simulations. The aggregate memory of the machine should be retrievable from disk in 5 minutes. A smoothly running center supporting long running jobs must minimize the system administration impacts. Analysis, inter-comparison and validation tasks will have high I/O bandwidth needs, but still less than this system requirement.

Disk Capacity

The models can produce voluminous amounts of data. A conservative estimate of the coupled model requirement is based on saving only monthly average data for each component. At T42 the coupled atmosphere requires 218 GB for each 150 year run. The ocean will save 300GB per run. Running twenty instances will thus require disk capacity of 10 TB. The development requirement for the atmospheric component is also based on saving monthly averages, though is will sometimes be run saving history twice per day.

Bandwidth to Archive

The system image from a checkpoint should be retrievable from (or write-able to) the archive in 55 minutes. This system requirement will also enable diagnostic analysis of archived climate runs and support access to ensemble coupled results for regional downscaling.

Software Requirements

Operating System Support

Robust system software is required for long running jobs that create voluminous output. High performance, optimizing compilers for Fortran, in its latest incarnations F90 and F95, are also required to achieve efficient execution. The usual cadre of network interface capabilities as well as high speed links to data storage archives is a requirement that must be worked out judiciously by the SSI computing centers.

The parallel I/O performance of the operating system and associated hardware interfaces is often overlooked in specification of user requirements. This is an acknowledged problem with ASCI level computations. Parallel file systems without superficial restrictions on file sizes or glaring inefficiencies in file management should be required and measured in the machine procurement. The operating system support for I/O can be a limiting factor in throughput. The tests described in our requirements include the regular I/O associated with production runs and will thus test the I/O capabilities of the OS.

Programming Model

The programming model will be distributed, shared memory (DSM), to take advantage of shared memory features on a cluster and direct control of message passing between clusters. The standard OpenMP and MPI software with appropriate interfaces to compilers and execution environments are required. Other message passing libraries, which allow optimized performance on specific hardware, are also encouraged.

Math Libraries

Optimized FFT’s and a full complement of numerical routines have proved important for efficient execution and production use of climate models. The suggested codes for acceptance testing and benchmarking of computer performance contain several math kernels.

Other Libraries

The NetCDF I/O library is increasingly necessary for exchange of climate and weather data. Several models use this self-describing format for input of initial conditions. Analysis tools also assume this or other standard formats. Provision for high performance I/O using the NetCDF format will be a requirement of either the vendor or the computing center.

Parallel Debuggers

Parallel debuggers for large distributed, shared memory codes under development are essential to productivity of ACPI model development teams. A preference for the tool TotalView is expressed in the Chemistry column, but other vendor supplied tools with equal or better functionality may be substituted.

Batch System

This is an important feature of the computing environment that will allow assignment of priority to the production runs and manage resources to support long running jobs. Specific features, required by the SSI Computing Centers user model and policy considerations, should be made part of the machine procurement.

Other Tools

Real-time system monitors are important to check the progress of long running jobs and to diagnose performance bottlenecks during the execution of the run.

Diagnostic tools

For analysis of climate model output and processing for graphical presentation, the PCMDI VCS tool and model postprocessors must be available to users. In particular, the NCAR CSM postprocessor should be made available.

Visualization

Basic graphics functionality is required. Most high-end visualization will take place remotely, but access to standard visual and analysis packages, such as AVS and IDL, will be needed.

 

 

 

 

 

GSS Requirements Table T42(atm) -2/3 degree x32 levels(ocn) 1/10 degree x 40 levels T170 x 32 levels 4-nest x 33 level T42 x 44 levels
  Coupled Model Global Ocean Global Atmosphere Mesoscale Chemical
Through put 1500 years/week 15years/ 1 week 15 years/8 hrs 4 days / 1 hr 15 years/8hrs
Sustained GFLOPs 10 GFLOPs /realization 2500 GFLOPs 1334 GFLOPs 100 GFLOPs 1000 GFLOPs
Local memory (per proc): peak FLOP to byte 2 to 1 2 to 1 2 to 1 2 to 1 2 to 1
Aggregate memory 2.5 TB 2.5 TB 2.5TB 2.5 TB 2.5 TB
L2 cache 8 MB 8 MB 8 MB 8 MB 8 MB
Inter-node, bi-directional bandwidth 1GBs 5 GBs 10GBs 1 GBs 5 GBs
Inter-node latency 5 usec 5 usec 5 usec 5 usec 5 usec
Aggregate crossection bandwidth (Gbytes/sec) 5 GBs 64 GBs 166 GBs 5 GBs 64 GBs
Agregate IO bandwidth 8 GBs 8 GBs 8 GBs 8 GBs 8 GBs
Aggregate Disk 50 TB 10 TB 6 TB 1 TB 10 TB
Archive Capacity 500 TB        
Bandwidth to Archive 800MBs        
Specific performance codes requiring demonstration PCM x 10 POP PCCM3.2 MM5V2.12 IMPACT
           
Software requirements          
OS support for IO & user services UNIX,MPI I/O, PFS, HIPPI, ATM       MPI I/O
OS for compute nodes (if applicable on architecture) UNIX, MPI, OpenMP,PVM OpenMP, MPI OpenMP, MPI OpenMP, MPI UNIX, MPI, OpenMP
Programming model DSM DSM DSM DSM DSM
Communication libraries MPI, OpenMP, HIPPI MPI,OpenMP MPI, OpenMP    
Math libraries BLAS, FFT's, LAPACK   SPHEREPACK    
Other Libraries NetCDF       NetCDF
Language support F95,C F95, C F95, C F95, C F95, C
Parallel debugger for DSM code on full system for DSM code on full system for DSM code on full system for DSM code on full system TotalView
Batch queuing system long run capability large job large job large job long running job
Other system tools realtime system monitor       realtime system monitor
Climate Diagnostic tools VCS, PostProcessors        
Visualization tools NCAR Graphics, AVS, IDL     Vis5D NCAR Graphics, IDL
Performance tools profiler, DSM stats profiler, DSM stats profiler, DSM stats profiler, DSM stats profiler, DSM stats