CCM/MP-2D T3E-900 Parallel Algorithms

Performance Studies using

CCM/MP-2D


SGI/Cray Research T3E-900 Parallel Algorithms for CCM/MP-2D

(Results based on May, 1998 PSTSWM Experiments)

The SGI/Cray Research T3E-900 is a distributed-memory parallel architecture built around a high-performance 3D torus interconnect. We used the results of our May, 1998 studies of PSTSWM performance to identify the following parallel algorithms for CCM/MP-2D on the T3E-900.

When referring to the parallel algorithms and their implementations, we use the following shorthand:

The parallel algorithms chosen for the T3E-900 experiments are listed below. For the most part, the first choice in each category is the optimum identified in the PSTSWM experiments, while the second is included to verify that protocols that permit latency hiding and communication/computation overlap are not important.

Five different algorithms were examined, two distributed FFT/distributed Legendre transform algorithms:

d0: (df0 , dl0 , ds0 , lb0)
d1: (df1 , dl1 , ds1 , lb1)

and three transpose FFT/distributed Legendre transform algorithms:

t0: (tf0 , dl0 , ds0 , lb0)
t1: (tf1 , dl0 , ds1 , lb1)
t2: (tf0 , dl1 , ds0 , lb0)

where the codes for the individual parallel algorithms are as follows:

Distributed FFT

Transpose FFT

Distributed Legendre Transform

Distributed semi-Lagrangian

Physics load balancing


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:28:57 EDT.
81394 accesses since 1/2/96.