CCM/MP-2D SP3-200 Parallel Algorithm Comparisons

Performance Studies using

CCM/MP-2D


IBM SP3-200 Winterhawk I CCM/MP-2D
Parallel Algorithm Comparison

Date/Person: June, 1999 / P. Worley
Platform: IBM SP3 at Oak Ridge National Laboratory (morgan.ccs.ornl.gov):
     62 2-way Winterhawk I SMP nodes (200 MHz POWER3 with 4MB L2 cache, equivalent to RS/6000 Model 260)
Environment: AIX 4.3.2;   PSSP 3.1
Compilation Options: mpxlf -O3 -qarch=auto -qtune=auto -qcache=auto
Math Library: ESSL
Communication Library: MPI
Problem Size: T42L18
Number of Timesteps: 2-11
Results:

2 Processors / Problem T42L18
Optimal Algorithm
minalgorithmaspect rationmapping
  3.5264e+01   d0    1x2    -1  

4 Processors / Problem T42L18
Optimal Algorithm
minalgorithmaspect rationmapping
  1.9540e+01   d0    1x4    -1  

8 Processors / Problem T42L18
Optimal Algorithm
minalgorithmaspect rationmapping
  1.0504e+01   t3    1x8    -1  

16 Processors / Problem T42L18
Optimal Algorithm
minalgorithmaspect rationmapping
  5.9195e+00   t5    1x16    -1  

32 Processors / Problem T42L18
Optimal Algorithm
minalgorithmaspect rationmapping
  3.4986e+00   t2    4x8    -1  

64 Processors / Problem T42L18
Optimal Algorithm
minalgorithmaspect rationmapping
  2.0262e+00   t4    4x16    -1  

DISCUSSION

The high variation in timings for similar parallel algorithms (especially for large numbers of processors) makes it difficult to use these experiments to identify optimal algorithms. However, the best aspect ratio for a given number of processors is less ambiguous, and using the best observed timing to choose the algorithm should not lead us too far from the optimum.

Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:17:32 EDT.