This graph shows the time during which each processor is computing
(busy), doing nothing while waiting for a message to arrive (idle), or
actively sending or receiving a message (overhead). The graph describes the
performance of PCCM3.2 (the two dimensional message-passing parallel
implementation of NCAR's Community Climate Model version 3.2) using 128
processors on the T3E-900 at the National Enersy Research Scientific
Computing Center (NERSC). The problem size is T170L18, corresponding to a 512
by 256 by 18 longitude-latitide-vertical computational
grid and using 5 minute timesteps. The parallel algorithm uses a 4 by 32
logical processor grid, so 4 processors are used to decompose the longitude
direction and 32 processors are used to decompose the latitude direction.
A transpose algorithm is used to undecompose the longitude coordinate in
order to calculate the Fourier transforms, followed by another transpose
(re)decompose in the resulting wavenumber coordinate direction. The transposes
are implemented using the MPI_ALLTOALLV command. A recursive halving
algorithm is used to compute the collective sums in the latitude direction
required by the Lengendre transforms.
Communication is also required to fill the halo regions when using
the semi-Lagrangian algorithm to advect moisture, when swapping data to
load balance the columnar physics computations, and when computing global
statistics.