PSTSWM Paragon NX Protocol Performance Summary

Performance Studies using

PSTSWM


Intel Paragon Protocol Performance

(NX Summary - May 13, 1998)

The Intel Paragon is a distributed-memory parallel architecture built around a high-performance 2D grid interconnect. The Paragon used in these experiments is a production machine managed by the Center for Comnputational Science (CCS) at Oak Ridge National Laboratory. Processors are not shared, and care was taken to use partitions of the grid that are isolated from other other users, to eliminate possible contention for bandwidth over shared links of the interconnect grid.

In these experiments we examine the protocol sensitivity of the NX communication library. NX is the native communication library for the Paragon (on top of which MPI is implemented). We generated Experiment A data twice. For the first set of experiments (A1), we used a 1x32 processor partition, the bottom row of the 16x32 processor grid comprising the Paragon used in these experiments. This one-dimensional partition allows us to test the one-dimensional algorithms in exactly the same configurations they would be used in as part of a two-dimensional data decomposition, both in terms of process placement and contention. This obviates the need to perform Experiments B and C. This is the same partition used for the OSF/MPI experiments.

For the second set of experiments (A2), we used partitions of size 4x2, 4x4, and 8x4, and used the PSTSWM Fortran FFT routines, instead of the KAI routines used in A1. This corresponds to the partitions and math libraries used in the October, 1994 SUNMOS experiments, allowing us to compare these two sets of data directly.

The most important results from the NX protocol experiments are that

Below, we summarize the parallel algorithm specific results. To indicate the variation in performance over the set of NX communication protocols, we give the

for each of the Experiment A1 problem cases. The data is presented in a table for each parallel algorithm. The cases are not labelled in the table, but are listed in the following order: T42 (P=16, 32, 8); T85 (P=16, 32, 8). For brevity, we also describe the performance sensitivity to be low, moderate, or high if the median-based statistic is <= 5%, between 5% and 15%, and >= 15%, respectively.

The following observations apply to more than one of the algorithm results and are listed here to cut down on the repetition:

DFFT
EXCHSUM
HALFSUM
RINGPIPE
RINGSUM
LOGTRANS (1)
LOGTRANS (2)
LOGTRANS (3)
SRTRANS (1)
SRTRANS (2)
SRTRANS (3)
SWTRANS (1)
SWTRANS (2)
SWTRANS (3)

PSTSWM Performance Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:29:55 EDT.
85455 accesses since 1/2/96.