Contents
1.0 Functionality Requirements
2.0 Software Engineering Design
Issues and Discussion
Appendix A: Pseudo-Code Templates
Appendix B: Class structure
This document describes the software requirements and design of the high performance atmospheric component (CCM4) for the Community Climate System Model (CCSM).
The major goal of the software design of the CCSM is to build a performance-portable model which can execute effectively on a range of platforms, in a variety of configurations (see Foster's DOE proposal). The immediate target platforms for CCSM are the IBM SP, the SGI Origin, and networked clusters, but the design requirements must take into account the overall goal to maintain performance portability across scalable parallel supercomputers, commodity clusters, shared memory multiprocessors, and vector supercomputers.
To address the overall goals of the CCSM for the atmospheric component,
the requirements have been separated into two broad categories: functionality
requirements and software design. Generally speaking, the functionality
describes what the atmospheric component must do; these are the
design goals. The scientific (and numerical) aspects of the atmospheric
model will be described in the usual documentation of the CCM4.
The software design addresses how the design goals are to be met
and the functionality requirements are to be provided. The conventions
on coding style and the specification of interfaces, user defined data
types, constructs for performance on vector, shared memory and destributed
memory computers are described.
The testing procedures required for maintence of code correctness in
the repository, along with a description of the build scripts used for
testing and running the code on the target platforms, are also described.
1 Functionality Requirements
1.1 Scalable Parallel Design and Performance Portability
A scalable parallel design using a mix of distributed- and shared-memory
parallelism is important to maximize throughput and scientific productivity
using the target platforms. The design must allow for adequate granularity
in the parallel decomposition to effectively utilize the scalable, high
performance computers. Of course, the design must pay special attention
to effectiveness on machines emphasizing low-resolution ensemble runs for
DOE applications. Decomposing the spatial domains in two or more
directions can yield the required granularity. The current assumption
of this programming model is that the "inner" directions will be handled
in shared memory using the standard OpenMP multitasking directives and
the "outer" directions of the decomposition will be implemented for distributed-memory
message passing. Thus a two-dimensional decomposition for the atmospheric
code could be achieved by decomposing the latitudes in distributed memory
and within each vertical level using shared memory. Other decompositions
may yield better performance, especially where load balancing is of concern.
The parallel decomposition can be different for different dynamical cores.
A different decomposition for the calculation of the physical parameterizations
(column physics) may also be advantageous. Certainly, to achieve
performance portability across the target architectures, a degree of flexibility
is a necessary design goal.
Requirement: The code will
be configurable for high throughput on a variety of high performance parallel
platforms.
Specific:
1.3.1 Eulerian-Spectral Dynamical Core
The Eulerian-spectral dynamical core will be restructured to eliminate
the strict dependence on a latitudinal parallel decomposition. The
parallel algorithm for the spectral transformations will exploit both a
distributed-memory (MPI) parallelism as well as a shared-memory (OpenMP)
parallelism. First, the data transposition software will be customized
and optimized to map distributed dynamics data to distributed physics data
structures (see below for segmented physics data layout). The transpose
software will be adapted from the methods used in the 2D parallel decomposition
of the PCCM-3 developed by Drake et. al. (1995), to a generalized decomposition.
These data transpositions will take account of the physical space data
as well as Fourier and spectral coefficient data. The dynamics data decomposition
will focus on a segmented implementation (same as physics) and a level-fields
decomposition that maintains contiguous memory for the transforms. The
segmented implementation will transpose and decompose data to calculate
the Fourier and Legendre transforms. That way the FFTs can effectively
use library software on a computational node and take advantage of the
peak shared-memory performance of the platform. The segment size
will be adjustable to hit maximum cache performance as well as to expose
parallelism on a reduced grid for multitasking and message passing.
The standard 1D parallel decomposition of CCM-3 will thus be a special
case and vector lengths that support good vectorization will be a runtime
option. The minimum design point for the CCM-4 is 26 levels and several
more fields. Shared-memory nodes with 4-16 processors are the order
of the day, so we expect a 2-D lat-lon decomposition to be efficient for
the parallel spectral transform. Both the physics (segmented) and the dynamics
data decompositions work well with a "reduced grid" for the spectral dynamics.
In fact, the generalization from a simple 1-D or 2D parallel decomposition
is necessary for proper load balancing of the computation on a reduced
grid. In both decompositions, the number of gridpoints calculated
by a compute node can be evenly distributed. The segmented decomposition
can use segment size to accommodate fewer grid points per latitude as the
poles are approached. With a segmented decomposition the number of
points can be approximately balanced. After the transpose and new parallel
spectral algorithms are in place in the dynamical core, the semi-Lagrangian
transport of the tracer fields will be addressed. This involves the same
considerations and strategies as for the semi-Lagrangian-Spectral dynamical
core. Exploiting this overlap, we hope to implement two dynamical
cores for the effort of something less than two.
Requirement: An Eulerian-Spectral
dynamics package will implement a scalable parallel algorithm in accordance
with the dynamical core interface rules.
Specific:
1.3.3 Lin-Rood Dynamical Core
The CCM-4 model now includes the recently developed Lin-Rood dynamical
core (Lin and Rood, 1996; Lin and Rood, 1997; Lin, 1997; Lin and Rood,
1998; and Lin and Rood, 1999). The basic algorithms are derived and evolved
from the modern high-resolution finite volume algorithms pioneered by van
Leer (1977) and Colella and Woodward (1984), which are one-dimensional
algorithms designed primarily for astrophysical and aerospace engineering
applications requiring the resolution of sharp gradients (e.g., shocks).
Details of the Lin-Rood dynamical core can be found in a recently produced
algorithm theoretical basis document (DAO 2000). In summary, its
unique attributes are the following:
1.4 Parallel Physics: Data Decomposition and Code Rework
The parallel decomposition of the parameterization package, the column
physics, is a key to high performance of the atmospheric model. Since
each vertical column of the atmosphere can be computed independently, there
is abundant parallelism for a scalable algorithm. As chemistry and
more comprehensive physical parameterizations are included in the model,
this parallelism only becomes easier to exploit. Two considerations are
very important, however, and will be addressed as part of this proposal.
First is the issue of node performance on the physics computation.
At an appropriate time in the development of CCM-4, we will rework the
parameterization package to support a general segmented data decomposition.
The required rewrite will change the parameterization package so that it
works on a collection of columns, with inner loop indexing running from
an ibegin to iend. Currently, the inner loops assume that an entire
latitude slice is being processed with the ending index variable to accommodate
a reduced grid. We will add the beginning index construction. The current
choice is excellent for node performance on vector machines but requires
an extremely large cache for the types of computer architectures prevalent
within the DOE community. The segmented data decomposition is thus a generalization
of the 1D and 2D parallel decompositions, containing both as a special
case. In addition, it is a simple decomposition in that all the details
of the column location can be hidden. It will thus simplify the coding
style and readability of the code. Cache utilization and local shared-memory
multitasking will be supported with this decomposition of the physical
space so that the scientists developing parameterizations will not need
to multitask their parts of the code. These basic code modifications
will be done as soon as reasonably possible given the evolving parameterization
code. Since this portion of the code consumes a large fraction of
the total compute resources it will be best to address it early in the
planning and development process. The second consideration is load balancing.
Especially for reduced grids a general data decomposition is required.
Since the local node will call the parameterization package with an arbitrary
collection of columns, the column assignment can be made to effectively
balance the load among the computational nodes. The short wave radiation
calculation is an example of the type of load imbalance that occurs.
By assigning an equal number of day columns to the nodes, the computational
load can be equalized. This assignment will be accomplished in the
context of the data transpose steps. The inclusion of load-balancing
transpose routines will be accomplished in the third step of our phased
development. The design of the column physics could require that location
(lon,lat) is passed to the physics rather than assumed based on index values.
This may be necessary to support the general load balancing and segmentations
we are proposing.
Requirement: The physical
paramerization packages will conform with column physics interface rules
and will effectively utilize target machine memory heirarchies. Load
balanced decompositions will be supported.
Specific:
1.5 Atmosphere - Land Surface Interface
The proposal for the redesign of the flux coupler will have an impact
on the design of the atmospheric model. In particular, the option
for including the land surface model with the atmospheric model, using
the same grid (as in PCM), will have implications for the structure of
the atmospheric model. Implementing efficient parallel algorithms for the
land surface model is tied strongly to the parallel data decomposition
of the atmospheric model. The code structure is problematic: it allows
a separate executable for the LSM or an interface as part of the CCM-4
executable; it is not clear how to construct this interface except with
a large block of ifdefs. The design will be an early priority.
Requirement: The land surface
package will execute in parallel with the atmospheric model or with the
CSM coupler.
1.6 Validation and Testing
Finally, the design must include model validation and code testing
procedures. By model validation, we mean validation between different
platforms where it is expected not to produce exactly the same numerical
results. One example of such a validation procedure compares the
rate of separation between results started from the same initial conditions
(see Rosinski and Williamson 1997). Code testing procedures must
include checks to see that the code is working as expected. These
may include performance expectations, numerical results, reproducibility
and reconfigurability. The inclusion of such procedures in a design
document is important so that everyone understands what the rules are and
how much leeway programmers have in making code modifications.
Requirement: Code entered
in the CCM repository will be tested and validated according to the configuration
management procedures.
Specific tests: All platforms
on the target platform list must be checked with tests appropriate for
the modification made. The documentation for the change and the test
results will be entered in a repository.
1.7 Design, Evaluation and Review
The design document for the atmospheric component of CCSM should be
thought of as a dynamic work in progress. A process will be
set in place, in conjunction with the CCSM Software Engineering Working
Group and NCAR management, that will allow for design reviews both internally
and externally. An evaluation of how well the software meets
the stated design goals and requirements will be conducted. With DOE, NSF
and NASA participation in this project, an "open" design process will result
with the design document as the public deliverable. As the design
evolves, we expect to identify further encapsulation in the software design,
which will require interfaces to be documented and implemented.
Requirement: A CCM design
document will be kept up to date and made available to collaborators for
comment and revision in accordance with CCSM policy.
2. Software Engineering Design
2.1 Language and Coding Conventions
The approach taken will be oriented toward a high performance implemention
but with a software organization that is object oriented. The object
orientated design drives many of the decisions on data structures, control
structures and procedure encapsulation. The Fortran90 standard will
be followed subject to style and conventions guidelines described in http://www.cgd.ucar.edu/cms/ccm4/codingstandard.shtml.
Use of other languages (eg. C and C++) is limited to the utility and library
layers and must be Fortran90 callable. Note in particular, the rules
for the development of stand-alone packages included in the CCM.
Packages should be documented and tested in accordance with configuration
and repository management procedures.
To facilitate portability, the machine dependencies will be minimized and model configurations will be specified with runtime parameters. Special installation features and non-uniformities will be accomodated in the build mechanism using "cpp" with "ifdefs" and "defines" in the source code. The definitions will be integrated into the build mechanism.
The following conventions for "cpp" defines have been adopted.
2.2 Packages and Interfaces
The definition of a package is that it can be built and tested
independently. Interfaces will be testable for performance and correctness
using "unit testers". The packages and the interface routines constitute
the model layer
of a three layered software design. The model layer
calls routines from a utility layer which may call routines from
a library layer. The design isolates the interface to installation
specific math libraries or other optimized routines in the utility layer.
The distinction between the utility layer and the library layer
will be flexible as we expect to define this interface to fit the typical
installation of the code and to clarify the interface with special libraries.
The utility layer will be distributed with the code, while it is expected
that the library layer will only contain default or filler routines.
The atmospheric model package will consist of five sub-packages
In keeping with the object orientation for the specification of
interfaces, packages will communicate with each other through state objects,
consisting of state data and methods for moving the data between package
data structures. The required methods may include changing the data
decomposition, regridding, calculation of derived variables and conversion
between states, tendencies or fluxes. Interface routines will use
inquiry functions to obtain data from the state structure and will pass
arrays to the working code.
2.2.1 Atmospheric Driver
The atmospheric driver will implement the control function for the
dynamics and physical parameterization packages.
The "dynamical core" actually performs 2 nearly separate functions
The atmospheric driver is described in psuedo-code in Appendix A. The basic flow control is shown here.
BASIC ATM DRIVER FLOW CONTROL
2.2.2 Dynamical Core Interface
Dynamical cores will provide a public type called state which holds (or returns upon query) the current values of the prognostic and diagnositc fields. An interface routine will convert between the dynamical state and the physical parameterization state. The state will hold information on distributed data decomposition which can be used for transpose and redistribution functions.
The dynamic variables needed for the physical parameterizations are
u, v, T, ps, phi, omega, q. These constitute the phys%state.
The dyn%state consists of the following variables for each of the
dynamical cores.
| Variables required | Eulerian spectral | Semi-Lagrangian spectral | Lin-Rood |
| state variables | zeta, div (u,v,), T, lnps,
q, (phi_s) |
u, v, (div), T (delT), lnps, q,
(phi_s, del phi_s) |
u, v, theta_v, delta p, q, (phi_s) |
| interface transformations | ps, phi, omega | ps, phi, omega | ps, phi, omega, T |
(notation: del = horizontal gradient, delta = vertical difference)
The dynamical core takes as input the current state (which must be consistent with the state history of the dynamics or be an initial or restart condition). On output, the dynamical core returns the updated state (and has updated the internal state history). Thus the time stepping method, leap frog explicit, semi-implicit, etc, are considered part of the dynamical core.
In addition, to initialization routines, the dynamical core must also interface with restart logic. Depending on the time stepping method, this may require saving state history and time integration information to the restart file.
2.2.3 Physical Parameterization Interface
The physical parameterization package takes as input, the physics state defined on the physics grid. The package produces the tendencies for u, v, T, ps, phi, omega, q. The physics driver and the individual parameterization packages are given in pseudo-code in Appendix A and outlined in the following.
PHYSICS DRIVER
Input T,u,v,ps,q+tracers on parameterization grid/distribution, location (lat,lon), basic time information, flag if restart data set should be produced this time step.
(NOTE: dynamics package may provide the specification of the physics grid. There is no compeling need for different dynamics and physics grids and efficiency and stability may be significantly enhanced if they are identical.)
Loops over parameterization sub-steps (if desired)
Calculates (via call to library routine) required time. For example, if parameterization time step equals overall time step, time n for 3-time-level scheme, n+1/2 for 2-time-level scheme if parameterization is sub-stepped, time appropriate for that sub-step.
Calculates and passes tendencies of individual parameterizations to history buffer.
Physics conservation initialization (column-wise since parameterizations are currently all column only) - column-wise provides more information than global, but requires more space. If initial values are saved for total tendency calculation they can be used for energy conservation calculation.
Physics conservation check, fixers applied as required.
Output net T,u,v,ps,q+tracers tendencies for total time step from all parameterizations on parameterization grid/distribution
Each parameterization is described by the following.
INDIVIDUAL PARAMETERIZATIONS
input T,u,v,ps,q. If parameterization works on some other variable like theta, parameterization does conversion. This could be changed to driver does conversion before and after. Which makes more sense depends on whether tendencies or updated variables are returned.
output(to physics driver) = updated values (this could be changed to tendencies since someone must compute them for the history buffer, but which ever is chosen, all parameterizations must do the same.
output(to restart buffer on command) time evolving data
needed for restart
NOTE: parameterization initialization
component will need a restart component. But then if it needs restart data,
it will need to initialize it somehow. So initialization routine will need
a restart flag.
output directly to history buffer
The parameterization component of history
buffer is distributed as parameterizations, makes dynamical load balancing
more cumbersome since either buffer would need to be redistributed or data
to buffer transposed.
initialization from individual file, should not require specific order with respect to other parameterization. This makes NAMELIST a problem or requires a "rewind" of the input file.
The physics parameterizations work on an arbitrary collection of atmospheric
columns indexed from 1 to ncols. It is not assumed that these columns
constitute a continguous set of longitutes or that they are all on the
same latitude. However,
the latitude and longitude coordinates of any given set of columns
are available to the parameterization.
2.2.4 Coupler and Surface Model Interface (see CCSM Coupler
Design)
The CCSM coupler and surface model interface will also use interface
routines to handle conversions between dynamical states and physical paramterization
states as well as the requisite data transpositions and redistributions
for distributed memory data structures.
2.2.5 History File Interface (see CCSM I/O Design)
The physics and dynamics packages will interface directly with the
I/O package for history output. The code will retain current approach (OUTFLD)
until new version is completed. The distributed data structures for
saving history output coincide with the decomposition of the calling routine.
ThusOUTFLD must distinguish sizes and lable values correctly. For
example, the
dynamics can call OUTFLD with a latitude slice and lat index.
When a physical parameterization routine calls OUTFLD the chunk index is
supplied. Buffers indexed by lat and chunk are kept in place in their
appropriate distributed location until a gather is peformed from the call
to WSHIST.
History accumulation buffers distributed as in physics and as in dynamics
use a time variable for labels, etc which is independent of time
variable in parameterizions.
2.2.6 Restart/Regeneration Interface
The driver, dynamics and physics packages will interface with the restart/regeneration
I/O package. In particular,
the dynamics will provide a structure that contains everything needed
to initialize and to read/write to a restart file. It would only
be used by the package that defines it and would contain all time history
information necessary to initialize/restart the package. Since this
structure is separate from the other model interface structures it may
be possible, with little extra effort, to run multiple instances of the
dynamics package (or physics) simultaneously.
The physics restart structure would include absorptivities and emissivities
as well as radiative heating rates.
2.3 Utility Layer
Several modules and data types are common to multiple packages
and functions are used by multiple packages. These shared data types
are particularly important for the interface routines linking the packages
and constitute the utility layer.
Currently, the atmospheric model makes no attempt to use the utilities
available from the Model Coupling Toolkit or other intercomponent
utility routines. The use of these routines for intracomponent
interfaces, though desirable in promoting a unified CCSM design, is subject
to performance consideations. Similarly, the employment of CCSM wide
user defined types is desirable in the long run, but not implemented in
this software design. The user defined types described here are only
applied to the atmospheric models.
2.3.1 User Defined Types
The Lin-Rood dynamical core uses an equally spaced Latitude-Longitude grid with staggering of grid variables to support the finite volume discretization. The vertical grid is defined by the "floating" Lagrangian control volumes. The parallel decompositions supported for Lin-Rood are a 1-D latitude decomposition and a 2-D latitude-level decomposition. Transposition between these decompositions occurs internally to facilitate vertical coupling when levels are decomposed in the 2-D.
The Eulerian Spectral and Semi-Lagrangian Spectral dynamical cores use a Gauss grid and a reduced Gauss grid in the horizontal direction. A hybrid sigma- pressure coordinate is used in the vertical direction. All grids can be defined in a tensor product formulation which supports efficient interpolation schemes. The parallel grid point decompositions supported are a 1-D latitude decomposition and a 2-D lat-lon decompostion.
The grid structure used in the physical parameterization package is the chunked data structure. A grid chunk consists of an arbitrary collection of atmospheric columns. The column index is first, level index is second, in each chunk and the number of columns may vary between chunks. Chunk dimensioning is as(pcols,plev) and indexing from 1,...,ncol, 1,...,nlev. The number of chunks of the decomposition is determined by performance considerations. Both parallel granularity and performance of the MPI decomposition are supported by the chunking as well as local memory/cache utilization in the serial and OpenMP decomposition.
Dynamics state - contains the prognostic variables and is indexed by block as (i,k,j,lblk). The lat index and block index are bounded using beglat:endlat and begblk:endblk. ASo the block index is almost analogous with the chunk index. A similar grid structure is implied.
Parameter values for blocks are set in PMGRID along with the other indexing information. The PROGNOSTICS module has the dynamics state variable allocations and declarations. DYN_GRID contains the dynamics grid information.
Some, though not all, fields are subject to the spectral transform methods. Associated with these fields is a set of spectral coefficients, referred to as the associated spectral space. Spectral space is decomposed for the calculation of the semi-implicit (Helmholtz) solve. Vertical levels are not decomposed for this part of the calculation.
The regridding of fields to couple with the physical parameterization package is accomplished through the DP_COUPLING module.
The physics package "owns" the data type (defined in PHYS_TYPES and the PHYS_GRID module) in the sense of controlling allocation and managing pointers. Similarly, the dynamics packages own the dynamics grids and fields (as defined in PROGNOSTICS and the DYN_GRID module).
2.3.2 Distributed Data Structures
The user defined field types described above are distributed data structures.
The processor maps and accounting for message passing operations using
these fields are contained in decomposition modules specific to each dynamical
core and to
the physics. The transformation and data transposition required
to couple the dynamics package with the physics package are accomplished
in the DP_COUPLING module. This module interfaces at the package
level, providing state and tendency information --a bundled set of fields.
2.3.3 Methods and Procedures
Interpolation for the semi-Lagrangian Dynamical cores requires halo
update operators for the distributed fields.
Interpolation operators are provided by the dynamics packages using
the grids and fields. These are necessary for the semi-Lagrangian
interpolation and (possibly different operators) for the interpolation
to fields defined on the observational grid.
Interpolation is also required to move between non-uniform and uniform
or centered and staggered grids.
2.3.4 Use of CCSM Utilities
The utility layer defined in the atmospheric model is currently seperate
from the intercomponent utility layer of the CCSM. In particular,
the model coupling toolkit has not been utilized or extended to support
the three dimensional field transformations and regriddings required by
the atmospheric models utility layer. As performance may suffer from
the unification of the utility layers, this step will be approached cautiously
in the future.
2.4 Parallelism
To exploit clusters of shared memory multiprocessors a hybrid distributed/shared
memory programming paradigm will be adopted. In general, the distributed
data decompositions will be among the nodes ofthe cluster and shared memory
will be expoited within a multiprocessor node. This implies that
the distributed memory message passing constructs occur at a higher level
in the call tree than the shared memory constructs. For example,
the multiple Fourier transforms in the spectral method are decomposed by
latitude and level via a distributed data transpose. A shared memory
optimized FFT library performs the multiple (local) Fourier transforms
on a node.
2.4.1 Shared Memory Constructs
OpenMP directives will be used to specify shared memory parallel tasks.
Shared memory parallelism on loops should be specified at a high level
in the call tree and isolated in driver, utility and library routines.
2.4.2 Message Passing Constructs
MPI standard message passing calls will be used to implement distributed
memory parallelism. The parallel data decomposition and associated processor
mappings are associated with the data objects. Message passing constructs
should be used only in driver, utility or library routines with the data
decompostion and distributed data transposition occuring at a high level
in the call tree.
2.5 Library Layer
2.5.1 Math Libraries
Library routines for single processor (or multitasked) spherical harmonic
transforms, ordered, real Fast Fourier transforms, random number generation,
solution of a general dense linear system with condition number estimation,
scalar (dot) product of vectors, vector exponential, vector logrithm, vector
trig functions, vector square root.
2.5.2 I/O Libraries
The NetCDF library is required for reading initial conditions and for
output of histories.
2.5.3 DataTranspose and Transformation Libraries
The data transpose routines are used by the driver and interface routines
by calls to shared module procedures. Thus the driver or interface
routines may require a data transpose of a dynamics field to a physics
field. The shared module procedures are available to effect the transposition
calling these libraries.
2.5.4 Simulation Calendar Routines
The PHYSICS DRIVER receives basic calendar and simulation time information
(not sure how to do this if parameterizations sub-step). History
tape requires time and date information for header variables. This should
be independent of time information for parameterization.
2.6 Utilities
2.6.1 Vector routines
Vector masks for conditionals and index manipulations will include
intmax, ismax, ismin, isrchfgt, isrchfle, wheneq, whenfgt, whenfle, whenflt,
whenne.
2.6.2 Timing and performance monitoring
Time of day stamps, a wall clock time, second timer will be included.
For performance monitoring in a MPI/OpenMP (hybrid) mode, these timers
must be thread safe and provide some syncronization method. More
specific hardware performance monitoring routines and routines that diagnose
parallel performance will be included in the utility layer so that instrumentation
is properly encapsulated.
2.6.3 Production managment
Routines that return the time remaining in a run and other allocation/queue
parameters.
2.7 Build Mechanism
A portable build mechanism is used to configure, compile and test the
code. The runtime configurable parameters include the number of processors
and parallel decomposition. Compile time configurable parameters
include the model resolution, the segment size, the floating point precision
and processor characteristics.
2.8 Tests Before Code Checkin
Standard NAMELIST and *.h files and datasets will be created for these
tests
DEFINITIONS
NCAR production machines = SGI, IBM, Compaq, or local versions
NCAR development machines = Sun, Linux PC
default model = 30-level T42 full physics (+NDENS = 1)
bit-for-bit = two day run with multiple advected and non-advected
constituents
Entire standard daily averaged history tape checked.
e.g. UNIX diff on the output from 2 ncdump commands,
or Rosinski's cprnc program on history tapes.
Basic difference growth test
perform two runs plotting
the point by point, mass weighted, rms difference of temperature as function
of time
1) one two day run writing out T,PS every timestep (nstep = 144)
a) set PERGRO namelist flag to true
b) #define PERGRO in misc.h
2) second two day run writing out T,PS every timestep (nstep = 144)
a) set PERGRO namelist flag to true
b) #define PERGRO in misc.h
3) run Rosinski's cprnc program on output history tapes.
4) using output from cprnc plot the difference of the mass weighted rms
value of temperature.
Perturbation growth on single machine:
One run 'as is'
Second run with
set PERTLIM
namelist variable to bit level perturbation
(1.e-14 on ieee machines)
Difference growth test on different machines:
Two runs, each on
a different machine. For this test the PERTLIM namelist flag should
not be set. The perturbation is generated from running on different
platforms using different libraries and compilers.
Difference growth test of changed code
One run with previous
TAG code
Second run with new
code
For this test the
PERTLIM namelist flag should not be set. The perturbation arises
from code differences.
Above two tests are successful if
the difference grows
no faster than a minimal perturbation on the single machine with smallest
mantissa. Curves should look very similar to plot produced by difference
error growth test on same machine.
Always enable the OpenMP directives when testing
Following tests are for 'code' changes, as opposed to 'algorithm'changes in the physics or dynamics. The correctness of the latter must be verified by the scientists somehow. There is no simple, safe strategy to test 'algorithm' changes. 'Code' changes and 'algorithm' changes should be isolated.
BASIC TESTS (default model)
o Compile default model on all NCAR production and development machines.
(Compilers are still flaky enough that bugs are shaken out this
way)
o Restart matches a continuous initial run bit-for-bit on all production and development machines. (Continuous 2 day run, restart day 1). If the radiation code has been modified, also test restart from a non-radiation timestep.
o Different domain decompositions match bit-for-bit.
o Updated version compared against previous TAG
1) bit-for-bit,
a) one
dynamical core on NCAR production and development machines
b) three
dynamical cores on one machine
2) Difference growth test on failures of above.
Error Growth on same
and two different production platforms.
3) Perform basic timing and memory checks to make sure
new code
does not adversely affect performance
or memory footprint.
( less than 10% increase over previous
tag usually ok.)
BASIC TESTS (adiabatic version)
o Updated version compared against previous TAG
1) bit-for-bit,
a) one dynamical core on NCAR production machines (and occasionally on
additional machines)
b) three dynamical cores on one machine
2) Difference growth test on failures of above.
ADDITIONAL TESTS
o If new fields are required on the initial dataset, test that model
starts correctly from initial history tape written by the model
(write_inithist.F90).
o If any mods were made to any dynamics code, the other two cores should also be checked. If appropriate, make sure the code changes also percolate into the appropriate dynamics side of the CVS directory tree (i.e. if you made a change to eul/inidat.F90, also make the same change to sld/inidat.F90 and lr/inidat.F90).
o If any mods were made to a single branch of a cpp #define the other branch(s) should also be checked. One example is making a change to code within a #define SPMD branch. The non-SPMD branch should also be checked and modified accordingly. Checking usually entails running a start/restart test exercising all branches of the cpp #define.
o Make sure code runs in both Hybrid MPI/OpenMP and just OpenMp (non spmd) mode. The production configuration is hybrid mode but for development and debug purposes the non-spmd version of the model must work as well.
o Before code is checked in:
1) Make sure code adheres
to CCM coding standard.
2) Check code with flint
tool.
3) Make note of any major
timing or memory penalties and associated tests.
4) Update documentation
(Users Guide).
Issues and Discussion
From Byron:
I would like to raise 3 issues that we should address in the design
document.
----------------------------
1. The difference between a GCM with interactive chemistry and a
chemical transport model are relatively minor. We should encompass
both in the design.
The "dynamical core" actually performs 2 nearly separate functions
a. time stepping the dynamics
b. advecting tracers
Although these functions are closely related, they are performed
fairly separately, in most "dynamical cores".
A chemical transport model needs to replace the time stepping of the
dynamics with reading of datasets, time interpolation, and a variety
of consistency requirements. The advection of tracers is unchanged
We could try to specify the interfaces inside the "core" more clearly,
so that only the dynamics could be replaced, or we could merely note
that the "dynamical core" could be an offline advection module.
----------------------------
2. Section 2. Package Interfaces
I think that we should more cleanly separate initializtion and restart
functions for the packages.
One way to do this would be to require each package to take an INOUT
state structure (object?) as an argument. This structure would contain
everything that the package needed to initialize and to read/write
to
a restart file. It would be used only by the package
(dynamics/physics) which defines it, but would contain all time
history information.
The input and output of the
packages for use by
other parts of the model
would be contained in separate
structures.
As in Suarez' GEMS package, multiple models could be
advanced in parallel by using multiple instances of the state
structures.
The physics state would include
absorptivities and emissivities,
radiative heating rates
----------------------------
3. The design (2.2.2) states that the time stepping method is considered
part of the dynamical core.
An interesting problem is posed by the cloud water. This
parameterization maintains global arrays of its outputs [T,q,cldwater]
to determine the change by all other processes between calls to cloud
water. The output values are passed back in on the next appropriate
time step. With leap frog dynamics, the appropriate step is 2 steps
later and 2 time levels of stored values are required.
Where should the storing and cycling of these arrays be handled? They
have nothing to do with the dynamics, except that knowledge of the
time stepping procedure is required.
From:
Shian-Jiann Lin <lin@dycore.gsfc.nasa.gov>
I agree with Byron's concept about a "dynamics module" for
CTM. Inside the lr dynamics, we actually have a pure "dynamics"
module, a "transport module", and finally a mapping module (to
go from Lagrangian to the Eulerian Eta coordinate). They are
cleanly seperated. However, to maintain transport consistency and
mass conservation, both winds and mass fluxes are needed as input to
the
"transport module". So, the input to the "transport module" should
be
"dynamics" (i.e., input met data) dependent.
I only have a comment to Byron's last issue.
I think we may eventually want to have cloud water advected (Phil
seems to agree). So cloud water will be carried and transported
by the dynamics. This needs to be taken into design consideration,
in
addition to the [T,q,cldwater] values after cloud scheme.
Appendix A: Pseudo Code Template
Code Template for the Atmospheric Model
--------------------------------------------------------------
------------------- Main Program -----------------------------
--------------------------------------------------------------
initialize
time loop
physics(phys_state_phys_decomp, phys_tend_phys_decomp)
p_d_decomp(phys_tend_phys_decomp, phys_tend_dyn_decomp)
p_d_transformation(phys_tend_dyn_decomp, dyn_tend_dyn_decomp)
dynamics(dyn_state_dyn_decomp)
d_p_transformation(dyn_state_dyn_decomp, phys_state_dyn_decomp)
d_p_decomp(phys_state_dyn_decomp, phys_state_phys_decomp)
end time loop
States:
dyn_state_dyn_decomp: us,vs,thetas,ps,qs on dynamics staggered grid
phys_state_dyn_decomp: u,v,T,p,q on unstaggered grid using dynamics
decomposition
phys_state_phys_decomp: u,v,T,p,q on unstaggered grid using physics
decomposition
Tendencies:
dyn_tend_dyn_decomp: xxx on dynamics staggered grid
phys_tend_phys_decomp: xxxx on unstaggered grid using physics decomposition
phys_tend_dyn_decomp: xxxx on unstaggered grid using dynamics decomposition
Notes:
1) In this template, the interface methods are
d_p_transformation, d_p_decomp, p_d_decomp and p_d_transformation
These are only passed state information and not information on the
mappings
between packages. So the decomposition and mapping information
along with
interface information constructed from these are hidden (private to
the interface module).
2) The three states may or may not correspond to three
copies of the data.
3) The data flow diagram for the time loop is
phys_state_phys_decomp
|
V (physics)
phys_tend_phys_decomp
|
V (p_d_decomp)
phys_tend_dyn_decomp
|
V (p_d_transformation)
dyn_tend_dyn_decomp
|
V (dynamics)
dyn_state_dyn_decomp
|
V (d_p_transformation)
phys_state_dyn_decomp
|
V (d_p_decomp)
phys_state_phys_decomp
Questions:
1) where are diagnositic state variables defined?
theta, zm, zi, ln(pm), ln(pi), dp, ...
in the state structure, or in a second structure?
2) a similar transformation and decomposition will be required
for the history file and restart I/O. Will need an
outfld(phys_state_phys_decomp) and outfld(dyn_state_dyn_decomp).
Then an wshist(h_state_h_decomp) with preceding transformations.
3) the diagram does not show how state info is inout from physics
--------------------------------------------------------------
------------------- Package Interfaces------------------------
--------------------------------------------------------------
interface module
p_d_decomp(phys_tend_phys_decomp, phys_tend_dyn_decomp)
p_d_transformation(phys_tend_dyn_decomp, dyn_tend_dyn_decomp)
d_p_transformation(dyn_state_dyn_decomp, phys_state_dyn_decomp)
d_p_decomp(phys_state_dyn_decomp, phys_state_phys_decomp)
Notes:
1) The interface module holds routing information
and the
map between the two index spaces.
2) The transformation method uses the information
of how
to convert from one system to the other which is held by the
interface routine.
3) Interpolation routines for converting from staggered
to non-
staggered grids should NOT be located here, rather they are methods
of the dynamics. Should they be used and called from the interface?
4) The package interfaces all go through here.
What transformation
to perform could be identified from the type of input.
--------------------------------------------------------------
------------------- Decompositions ---------------------------
--------------------------------------------------------------
decomposition modules
for each grid decomposition, define how the
points are assigned
to processes. A different decomposition
module is required for
each package, eg. p_decomp and d_decomp_lr,
d_decomp_eul, d_decomp_sld.
These will be used
by the interface module to determine how to
pack and route state data.
Notes:
1) The current code uses commap and comspmd modules to
hold this information.
The information is a lon() and lat() array with the global index as
value
and reference from the local index, eg in p_decomp we would have
lat(ncol,lchnk). The physics processor map will be using the
lchnk
while the dynamics processor map is by latitude (for now).
--------------------------------------------------------------
------------------- Physics Package --------------------------
--------------------------------------------------------------
physics package
call physics(state, tend)
----- this is physpkg
call scan1bc(state(lchnk), tend(lchnk))
----- state is inout, tend is out
call linemsbc(state, tend)
call tphysbc(state,
tend)
declare ptend
---- paramaterization tendency structure
call param1_intr(state, ptend)
call update(state, tend, ptend)
--- modify state and cummulative tendency
call param2_intr(state, ptend)
call update(state, tend, ptend)
call surf_or_coupler (state, fluxes)
call scan1ac(state*(lchnk), fluxes(lchnk),
tend(lchnk))
call linemsac(state, tend)
call tphysac(state,
tend)
declare ptend --- paramaterization
tendency structure
call param1_intr(state, ptend)
call update(state, tend, ptend)
call param2_intr(state, ptend)
call update(state, tend, ptend)
call final_tend(state(lchnk), state*(lchnk),
tend(lchnk))
if (time_split) tend = (state* - state) / dt