SGI/Cray Research T3E-900 Parallel Algorithms
for CCM/MP-2D
(Results based on May, 1998 PSTSWM Experiments)
The SGI/Cray Research T3E-900 is a distributed-memory parallel
architecture built around a high-performance 3D torus interconnect.
We used the results of our May, 1998 studies of
PSTSWM performance
to identify the following parallel algorithms for CCM/MP-2D
on the T3E-900.
When referring to the parallel algorithms and their implementations, we
use the following shorthand:
The name of the individual parallel algorithms, e.g., srtrans,
is described on the PSTSWM protocol web pages.
"overlap" refers to a communication protocol that posts
one or more send or receive requests as early as possible, in an attempt to
overlap communication with computation or to hide latency. The default
case is "no overlap".
"ordered" refers to a communication protocol that does not
attempt to exploit bidirectional bandwidth in a swap or send-receive operation,
instead having one processor send and the other receive, followed by the
reverse when the first send is complete. The default is "unordered", i.e.,
logically sending both directions simultaneously.
A parallel algorithm for CCM/MP-2D is specified as a vector
consisting of the codes for the individual parallel algorithms, in the
following order:
The parallel algorithms chosen for the T3E-900 experiments
are listed below.
For the most part, the first choice in each category is the optimum
identified in the PSTSWM experiments, while the second
is included to verify that protocols that permit latency hiding and
communication/computation overlap are not important.
Five different algorithms were examined, two distributed FFT/distributed
Legendre transform algorithms:
d0: (df0 , dl0 , ds0 , lb0)
d1: (df1 , dl1 , ds1 , lb1)
and three transpose FFT/distributed Legendre transform algorithms:
t0: (tf0 , dl0 , ds0 , lb0)
t1: (tf1 , dl0 , ds1 , lb1)
t2: (tf0 , dl1 , ds0 , lb0)
where the codes for the individual parallel algorithms are as follows:
Distributed FFT
df0 - MPI_SENDRECV-based communication protocol: (0,6)