This graph shows the time spent in various user-defined tasks during an
interval of time. The user assigns a unique identification number for each
task, and ParaGraph assigns a color. (The user can also specify the
color assignments.) This graph describes the performance of CCM3.2/MP-2D (the
two dimensional message-passing parallel implementation of NCAR's Community
Climate Model version 3.2) using 128 processors on the T3E-900 at the
National Enersy Research Scientific Computing Center (NERSC). The problem
size is T170L18, corresponding to a 512 by 256 by 18
longitude-latitide-vertical computational grid and using 5 minute
timesteps. The parallel algorithm uses a 4 by 32 logical processor grid, so 4
processors are used to decompose the longitude direction and 32 processors
are used to decompose the latitude direction. A transpose algorithm is used
to undecompose the longitude
coordinate in order to calculate the Fourier transforms, followed by another
transpose (re)decompose in the resulting wavenumber coordinate direction. The
transposes are implemented using the MPI_ALLTOALLV command. A recursive
halving algorithm is used to compute the collective sums in the latitude
direction required by the Lengendre transforms. Communication is also
required to fill the halo regions when using the semi-Lagrangian algorithm to
advect moisture, when swapping data to load balance the columnar physics
computations, and when computing global statistics.