There are two major dynamics algorithms in the CCM2 code, the spectral transform method , ,  and the semi-Lagrangian transport method . The process models for radiation, clouds, surface moisture and temperature share the common feature that they are coupled horizontally only through the dynamics. We lump all these processes under the general term ``physics'' and note that the physics calculations are independent for each vertical column of grid cells in the model.
The independence of the physics calculations for each horizontal grid point is the primary source of parallelism in the PCCM2. By partitioning the horizontal grid points into blocks and assigning them to processors, a decomposition of the three dimensional, physical space data structures is defined. This decomposition allows the physics calculation for each vertical column of grid points to be performed without the need for interprocessor communication.
The dynamics calculations make use of the spectral transform method for the approximation of all horizontal derivates in the equations except those of the advective term in the moisture transport equation. The spectral transform involves two stages or two separate transforms, the fast Fourier transform (FFT) and the Legendre transform. The Fourier transform integrates information along each east-west grid line in the longitudinal direction. In the spectral transform from grid space to spectral space, this is followed by a Legendre transform integrating the results of the FFT in the north-south, or latitudinal, direction. Thus, the spectral transform operates on data ``globally'' in that information from each horizontal grid point, and from each processor, contributes to each spectral coefficient.
The FFT can be performed effectively in parallel  by exploiting the fact that there are multiple grid lines to be transformed at any one time. Two options for parallel FFT's are currently implemented in PCCM2. One is based on transposing the FFT data so that entire longitude lines are contained in a processor. Since there are multiple lines for each longitude and level, the parallel FFT's can be spread nearly equally among the processors. At the conclusion of the FFT, the data are transposed back to the original processor distribution. The other option is a distributed parallel FFT, where each longitude line is divided across a number of processors. A vector sum algorithm is used to calculate the Legendre transform in PCCM2. The parallelization of the spectral transforms in PCCM2 has driven most of the design decisions adopted for the organization of the data structures.
The advective terms in the moisture equation are approximated using a semi-Lagrangian transport method. The method updates the value of the moisture field at a grid point (the arrival point, A) by first establishing a trajectory through which the particle arriving at A has moved during the current timestep. From this trajectory the departure point, D, is calculated and the moisture field is interpolated at D using shape preserving interpolation. All the calculations involve physical space (grid point) data, and are decomposed over the processors with the same mesh decomposition used to parallelize the physics and the spectral transform. The parallel implementation of this algorithm uses the fact that timestep constraints imposed by the Eulerian dynamics limit the distance between the departure and arrival points in the latitude direction. By extending the arrays in each processor, thus ``overlapping'' regions assigned to neighboring processors, and updating the overlapped portion of the array prior to each timestep via interprocessor communication, the calculations in the different processors can proceed independently.
The rest of this section describes in more detail the data decomposition and parallel algorithms for the Legendre transform and the semi-Lagrangian transport. A detailed description and comparison of parallel algorithms can be found in the series of papers. [14,46,10,15]. More details on the parallel CCM2, and a description of load balancing techniques used in the physics component of the model, can be found in other papers [21,47]. A data parallel implementation of PCCM2 is described in  and a modestly parallel implementation for shared memory and distributed memory (1-D decomposition) is described in .