PSTSWM SP2-66 MPL Protocol Performance Summary

Performance Studies using

PSTSWM


IBM SP2-66 Protocol Performance

(MPL Summary - March 22, 1996 and July 26, 1996)

The IBM SP2 is a distributed-memory parallel architecture utlizing high-end workstation-class processors interconnected by an omega switch. The SP2 used in these experiments was a production machine sited at NASA Ames Research Laboratory. While processors were not shared, the processor configuration and the effect of other users on interprocessor communication performance were not directly controllable. To minimize these effects, runs were repeated multiple times and only timing runs not showing significant perturbations were used in this analysis. Perturbations affecting individual protocol timings were not eliminated, however, which adds to the "maximum" observed performance variation.

In these experiments, we examine the protocol sensitivty of the MPL communication library. MPL was the original native communication library for the SP2. It has since been "replaced" by MPI, but is still interesting for the differences in performance between the MPI and MPL libraries, given their outward similarities. The results described here come from legacy data collected over two years ago. The particular SP2 used has been dismantled, and these results do not reflect the most recent versions of the SP2 architecture.

At the time of these experiments, we were running only Experiment A. We have two sets of Experiment A data, one collected in March, 1996 and one collected in July 1996. These differ in that the March 1996 data is for runs that do not use the ESSL routines for the Fourier transforms, while the July 1996 runs do. Including the math library routines decreases the "granularity" of some of the experiments, but should not change the optimal protocols. Neither set of data includes 32 processor runs, and the July data only has data for 16 processor runs.

The most important results from the MPL protocol experiments are that

Below, we summarize the parallel algorithm specific results. To indicate the variation in performance over the set of MPI communication protocols, we give the

for each of the March 22nd Experiment A problem cases. The data is presented in a table for each parallel algorithm. The cases are not labelled in the table, but are listed in the following order: T42 (P=16, 8); T85 (P=16, 8). For brevity, we also describe the performance sensitivity to be low, moderate, or high if the median-based statistic is <= 5%, between 5% and 15%, or >= 15%, respectively.

The following observations apply to all of the algorithm results and are listed here to cut down on the repetition:

DFFT
EXCHSUM
HALFSUM
RINGPIPE
RINGSUM
LOGTRANS (1)
LOGTRANS (2)
LOGTRANS (3)
SRTRANS (1)
SRTRANS (2)
SRTRANS (3)
SWTRANS (1)
SWTRANS (2)
SWTRANS (3)

In additional to those mentioned earlier, some general rules of thumb appear to apply.

PSTSWM Performance Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:13:47 EDT.
86032 accesses since 1/2/96.