CCM/MP-2D AlphaSC-500 Parallel Algorithms

Performance Studies using

CCM/MP-2D


Compaq AlphaServer SC Parallel Algorithms for CCM/MP-2D

(Results based on November, 1999 PSTSWM Experiments)

The AlphaServer SC is a distributed-memory parallel architecture utilizing high-end workstation-class processors interconnected by a Quadrics (fat-tree) switch. The SC used in these experiments was a Compaq development machine with 8 4-way SMP nodes. We used the results of our November, 1999 studies of PSTSWM performance to identify the following parallel algorithms for CCM/MP-2D on the AlphaServer SC.

When referring to the parallel algorithms and their implementations, we use the following shorthand:

The parallel algorithms chosen for the AlphaServer SC experiments are listed below. Seven different algorithms were examined, three distributed FT/distributed Legendre transform algorithms:

d0: (df0 , dl0 , ds0 , lb0)
d1: (df1 , dl1 , ds1 , lb1)
d2: (df2 , dl2 , ds2 , lb2)

and four transpose FFT/distributed Legendre transform algorithms:

t0: (tf0 , dl0 , ds0 , lb0)
t1: (tf0 , dl1 , ds0 , lb0)
t2: (tf1 , dl1 , ds1 , lb1)
t3: (tf2 , dl2 , ds2 , lb2)

where the codes for the individual parallel algorithms are as follows:

Distributed FFT

Transpose FFT

Distributed Legendre Transform

Distributed semi-Lagrangian

Physics load balancing

d0, t0, and t1 use protocols that the PSTSWM experiments indicate are best for "small" granularity problems, while d1 and t2 use protocols that are best for "large" grain problems. d2 and t3 are the standard MPI protocols that one would choose without tuning.


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:03:35 EDT.
86302 accesses since 1/2/96.