Example ParaGraph Utilization Graph

ParaGraph Utilization Graph for PCCM3.2


This graph shows the time during which each processor is computing (busy), doing nothing while waiting for a message to arrive (idle), or actively sending or receiving a message (overhead). The graph describes the performance of PCCM3.2 (the two dimensional message-passing parallel implementation of NCAR's Community Climate Model version 3.2) using 128 processors on the T3E-900 at the National Enersy Research Scientific Computing Center (NERSC). The problem size is T170L18, corresponding to a 512 by 256 by 18 longitude-latitide-vertical computational grid and using 5 minute timesteps. The parallel algorithm uses a 4 by 32 logical processor grid, so 4 processors are used to decompose the longitude direction and 32 processors are used to decompose the latitude direction. A transpose algorithm is used to undecompose the longitude coordinate in order to calculate the Fourier transforms, followed by another transpose (re)decompose in the resulting wavenumber coordinate direction. The transposes are implemented using the MPI_ALLTOALLV command. A recursive halving algorithm is used to compute the collective sums in the latitude direction required by the Lengendre transforms. Communication is also required to fill the halo regions when using the semi-Lagrangian algorithm to advect moisture, when swapping data to load balance the columnar physics computations, and when computing global statistics.

Worley's Performance Studies Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 09:59:28 EDT.
5602 accesses since 1/2/96.