The code structure is designed to permit easy portability to other parallel platforms as well as ease of modification for climate researchers. Both the computer science / parallel computing community and the climate research community are served by the design. In addition, we have put a great deal of effort into the optimization of code for good parallel performance.
For the climate researcher who wants to modify the physics parameterizations, we note that the ``physics'' routines are almost identical to those in CCM2. To make a change in the physics no regard for parallelism should be required. The radiation and adjustment calculations associated with a column of the atmosphere are performed entirely on processor. This is also true of surface processes associated with any given point.
The parallel programming paradigm used is single program, multiple data (SPMD) with explicit message passing. A generic message passing functionality has been assumed based on SEND, RECV, SWAP and BCAST, which are then implemented in a machine or message specific library. Message passing implementations are available for MPI (Message Passing Interface), PVM (Parallel Virtual Machine), PICL (Portable, Instrumented Communication Library), MPL (IBM Native), and NX (Intel Native). Porting the code to another platform or message passing system should be as easy as providing the proper interface to the low level routines.
Since input and output continue to be a source of frustration for parallel computer users, we have implemented an option where all reading and writing is done from a single node (processor zero). The single node I/O should work on any parallel computer, but perhaps, not at the performance level necessary for production runs. We also provide for parallel, optimized I/O on the supported platforms. Since there are not I/O standards for distributed memory parallel programming, a new port will necessarily require some effort optimizing the I/O.
The call tree of the PCCM2 is essentially unchanged from CCM2. A significant exception to this is the elimination of the routine LINEMS. This routine has been split into PHYSICS and XFORM to accommodate blocked FFT's and transpose algorithms for the parallel spectral transform. The effect has been to separate the physics computation from the transforms.
Examining the data structures used in the PCCM2, the user will observe that the major 3d arrays in common /com3d/ have been modified. A new subscript has been added to account for the hemisphere (1=S, 2=N).
real u3(plond,plevd,platd,2,2),
$ v3(plond,plevd,platd,2,2),
$ t3(plond,plevd,platd,2,2),
$ q3(plond,plev,3+pcnst,platd,2,2)
The first index now refers to the local (on processor) longitude index.
The second index is for the vertical level and is unchanged from CCM2.
The third index is the latitudinal index. The fourth is new and refers
to the hemisphere. The last index refers to the time level.
plond and platd have also been redefined in pmgrid.com.
plond=plon + 1 + 2*nxpt, ! slt extended domain longitude
platd=p_lato2 + 2*nxpt + 2*jintmx, ! slt extended domain lat.
They now define
extra points for interpolation in SCAN1A and for processor overlap
information used in SCAN1A for the semi-Lagrangian method.
In general, modifications to data structures reflect the local processors data size and not the global problem data size. This effects a decomposition of the data among the processors.