PSTSWM T3E-900 SHMEM Protocol Performance Summary

Performance Studies using

PSTSWM


SGI/Cray Research T3E-900 Protocol Performance

(SHMEM Summary - May 11, 1998)

The SGI/Cray Research T3E-900 is a distributed-memory parallel architecture built around a high-performance 3D torus interconnect. The T3E used in these experiments is a production machine managed by the National Energy Research Scientific Computing Center (NERSC). While processors are not shared, the processor configuration and the effect of other users on interprocessor communication performance is not directly controllable. To minimize these effects, larger than necessary partitions were requested for these runs, and the complete experimental suite was rerun if evidence of unacceptable timing perturbations was found. The deterministic nature of the T3E timings when isolated from external effects made the identification of perturbations fairly simple.

In these experiments we examine the protocol sensitivity of an application-specific communication library built on top of the SHMEM one-sided communication operations get and put. This library attempts to decrease the communication overhead and latency and increase the bandwidth over that of the more general libraries such as MPI and PVM.

The most important results from these SHMEM protocol experiments are that

Below, we summarize the parallel algorithm specific results. To indicate the variation in performance over the set of MPI communication protocols, we give the

for each of the Experiment A problem cases. The data is presented in a table for each parallel algorithm. The cases are not labelled in the table, but are listed in the following order: T42 (P=16, 32, 8); T85 (P=16, 32, 8). For brevity, we also describe the performance sensitivity to be low, moderate, or high if the median-based statistic is <= 5%, between 5% and 15%, or >= 15%, respectively.

The following observations apply to all of the algorithm results and are listed here to cut down on the repetition:

DFFT
EXCHSUM
HALFSUM
RINGPIPE
RINGSUM
LOGTRANS (1)
LOGTRANS (2)
LOGTRANS (3)
SRTRANS (1)
SRTRANS (2)
SRTRANS (3)
SWTRANS (1)
SWTRANS (2)
SWTRANS (3)

In additional to those mentioned earlier, some general rules of thumb appear to apply.

PSTSWM Performance Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:25:21 EDT.
81011 accesses since 1/2/96.