PSTSWM Communication Characterization

Performance Studies using

PSTSWM


Best Point-to-Point Intranode Message-Passing Performance on the IBM p690 (32-way SMP "Turbo" node / 1.3 GHz POWER4 processor)

These results represent the highest measured bandwidths for each of the experiments: for the single iteration, nonoverlap communication protocols, both with and without cache invalidation. Results are presented for bidirectional and unidirectional protocols separately, using MPI. Note that the unidirectional bandwidth is also the "bidirectional" bandwidth when using the optimal unidirectional protocol to complete a swap. For some situations, this is a larger bandwidth (faster protocol) than when using the optimal bidirectional protocol.

MPI on 32 processor "Turbo" node

Bandwidth for Bidirectional Protocols

Bandwidth for Unidirectional Protocols

MPI on 8 processor LPAR (within a 32 processor "Turbo" node)

Bandwidth for Bidirectional Protocols

Bandwidth for Unidirectional Protocols

MPI on four 8 processor LPARs (within a single 32 processor "Turbo" node)

Bandwidth for Bidirectional Protocols

Bandwidth for Unidirectional Protocols


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:22:34 EDT.
81909 accesses since 1/2/96.