CCM/MP-2D Performance Studies

Performance Studies using

CCM/MP-2D


Algorithm Evaluations

When tuning the performance of CCM/MP-2D, timing experiments are first run using the kernel codes COMMTEST and PSTSWM. These results are used to eliminate parallel algorithm options that are not competitive. The next step is to use the remaining options in experiments with CCM/MP-2D.

Experiments are run for problem sizes T42L18 and T170L18, where each experiment involves timing 9 or 10 and 14 or 15 timesteps, respectively. To eliminate timing variations that occur only for the initial timesteps, the experiments typically run longer than the stated number, but only the last timesteps are timed. For T42L18, these timesteps are made up of 6 or 7 "normal" timesteps and 3 timesteps that include long and shortwave radiation calculations. For T170L18, the timesteps are made up of 13 or 14 "normal" timesteps and 1 radiation timestep. A third type of timestep that includes absorptivity/emissivity calcuations is not represented. However, this does not change the qualitative comparison of the different parallel algorithms.

Experiments are run for a sequence of numbers of processors. For each number of processors, experiments are run for the full range of supported aspects ratios. For example, for 64 processors, aspect ratios 64x1, 32x2, 16x4, 8x8, etc. would be tried, for each of the identified parallel algorithm options.

For each of the following platforms, the set of identified parallel algorithm options is described. This is followed by the results of the comparison, where the best algorithm is described for each number of processors and problem size. A separate graph is generated for each number of processors. This graph is a scatterplot of runtimes for each parallel algorithm, where the x-axis indicates the aspect ratio and the symbol indicates the particular parallel algorithm option, as defined on the "Candidate Algorithm" webpage. This is typically difficult to read, but should indicate something about the general distribution of runtimes. Since all of the tested parallel algorithms are among the best available, most of the variation should be a function the aspect ratio.

Note that the experiments run on the different platforms are typically not the same, differing primarily in the number and choice of timesteps. Therefore, the raw timings are not comparable between platforms without postprocessing. Also, due to the cost of these experiments (and difficulty of running on large numbers of processors on some platforms), not all aspect ratios are examined for all problem sizes.

RESULTS:

Compaq AlphaServer SC-500
Candidate Algorithms
Qualitative Comparisons
T42L18
T170L18
Compaq AlphaServer SC-667
Candidate Algorithms
Qualitative Comparisons
T42L18
T170L18
Cray Research T3E-900
Candidate Algorithms
Qualitative Comparisons
T42L18
T170L18
IBM SP3-200 (Winterhawk I)
Candidate Algorithms
Qualitative Comparisons
T42L18
T170L18
IBM SP3-375 (Winterhawk II)
Candidate Algorithms
Qualitative Comparisons
T42L18
T170L18
SGI Origin2000-250
Candidate Algorithms
Qualitative Comparisons
T42L18
T170L18

CCM/MP-2D Performance Studies Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 09:58:40 EDT.
5443 accesses since 1/2/96.