PSTSWM Paragon SUNMOS Protocol Performance Summary

Performance Studies using

PSTSWM


Intel Paragon Protocol Performance

(SUNMOS Summary - October 21, 1994)

The Intel Paragon is a distributed-memory parallel architecture built around a high-performance 2D grid interconnect. The Paragon used in these experiments was a production machine managed by Sandia National Laboratory. Processors were not shared, and care was taken to use partitions of the grid that were isolated from other other users, to eliminate possible contention for bandwidth over shared links of the interconnect grid.

In these experiments we examine the protocol sensitivity of interprocessor communication when using the SUNMOS operating system, which was developed at Sandia and the University oif New Mexico, While both NX and MPI libraries are available for SUNMOS, we used the low level message passing primitives provided by SUNMOS (_nsend and _nrecv) for this study.

The results described here come from legacy data collected over three years ago. The particular Paragon used has been dismantled, but these results should still accurately reflect the interprocessor communication performance of existing Paragon systems running SUNMOS. (The platform hardware has not changed over this period.) The data is particularly interesting as a way of measuring the impact of different approaches to parallel operating systems and interprocessor communication (OSF vs. SUNMOS) on the same hardware.

At the time of these experiments, we were running only Experiment A. We also did not attempt to use 1D partitions that would reflect the processors used in a 2D parallelization. Partitions for these experiments typically had square or near-square aspect ratios.

The most important results from the SUNMOS protocol experiments are that

Below, we summarize the parallel algorithm specific results. To indicate the variation in performance over the set of MPI communication protocols, we give the

for each of the Experiment A problem cases. The data is presented in a table for each parallel algorithm. The cases are not labelled in the table, but are listed in the following order: T42 (P=16, 32, 8); T85 (P=16, 32, 8). For brevity, we also describe the performance sensitivity to be low, moderate, or high if the median-based statistic is <= 5%, between 5% and 15%, and >= 15%, respectively.
DFFT
EXCHSUM
HALFSUM
RINGPIPE
RINGSUM
LOGTRANS (1)
LOGTRANS (2)
LOGTRANS (3)
SRTRANS (1)
SRTRANS (2)
SRTRANS (3)
SWTRANS (1)
SWTRANS (2)
SWTRANS (3)

PSTSWM Performance Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:23:01 EDT.
86208 accesses since 1/2/96.