PSTSWM AlphaSC-500 Point-to-Point Communication Performance

Performance Studies using

PSTSWM


Compaq AlphaServer SC SWAP Performance

(ordered swap of 8KB message using MPI within a node)

(performance measured per processor when all processors in node communicating)

Date/Person: January 26, 2000 / P. Worley
Platform: Compaq AlphaServer SC at Oak Ridge National Laboratory (colt.ccs.ornl.gov):
     16 ES40 4-way SMP nodes (500 MHz Alpha 21264 with 4MB L2 cache)
Environment: Digital UNIX V5.0;   RMS 2.36
Communication Library: MPI
SWAP size: 1024 REAL*8 floating point values each direction
Message size: Largest - 1024 REAL*8 floating point values
Smallest - 1 REAL*8 floating point values
Processors: 0 and 1
2 and 3
Latency Definition:(T1024-T512)/512
Model Error Range:[1,1024]
Results:

ordered simple swap
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 103.96 10.11 66.4%
10 iter. 127.20 10.53 68.0%

ordered swap using nonblocking send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 97.76 11.04 64.2%
10 iter. 129.44 10.68 67.6%
1 iter. w/overlap 94.60 10.91 64.7%
10 iter. w/overlap 125.76 11.11 67.4%

ordered swap using nonblocking receive
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 93.62 12.17 62.3%
10 iter. 125.99 11.83 65.5%
1 iter. w/overlap 104.89 12.10 63.7%
10 iter. w/overlap 130.67 13.89 64.3%

ordered swap using nonblocking send and receive
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 102.66 12.49 62.5%
10 iter. 122.18 12.21 64.5%
1 iter. w/overlap 99.06 12.59 61.6%
10 iter. w/overlap 127.28 14.07 64.5%

ordered swap using ready send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 81.92 22.07 50.7%
10 iter. 112.92 22.79 51.8%
1 iter. w/overlap 102.40 11.83 63.6%
10 iter. w/overlap 126.17 14.38 62.9%

ordered swap using nonblocking ready send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 89.92 22.76 50.4%
10 iter. 107.70 22.89 52.2%
1 iter. w/overlap 99.90 12.47 62.9%
10 iter. w/overlap 121.26 14.43 63.2%

synchronous
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 96.04 21.79 51.3%
10 iter. 115.01 23.35 50.6%

ordered swap using nonblocking sync. send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 93.52 17.74 53.0%
10 iter. 127.84 17.60 56.6%
1 iter. w/overlap 89.24 16.65 54.4%
10 iter. w/overlap 124.08 17.80 57.0%

ordered swap using nonblocking receive with sync. send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 94.71 17.94 53.3%
10 iter. 123.82 17.58 56.6%
1 iter. w/overlap 93.94 17.09 57.2%
10 iter. w/overlap 127.34 18.84 57.1%

ordered swap using nonblocking sync. send and receive
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 89.33 18.11 53.0%
10 iter. 123.00 17.18 57.7%
1 iter. w/overlap 92.04 17.55 58.2%
10 iter. w/overlap 127.05 19.78 55.7%

ordered simple swap using sync. send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 87.71 17.81 53.0%
10 iter. 120.06 17.66 55.8%


Protocol Sensitivity Summary for Unidirectional Swap of 8192 Bytes (1 iterations/no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  8   2.1509e-02   2.1004e-05   0.76   0.61   0.69   1.21 
  16   1.1152e-02   2.1781e-05   1.47   0.57   0.62   1.18 
  32   5.3282e-03   2.0813e-05   3.07   0.62   0.71   1.27 
  64   3.4148e-03   2.6678e-05   4.80   0.51   0.50   1.12 
  128   1.9368e-03   3.0263e-05   8.46   0.44   0.43   1.01 
  256   1.1802e-03   3.6881e-05   13.88   0.40   0.37   0.84 
  512   1.4322e-03   8.9513e-05   11.44   0.13   0.12   0.28 
  1024   7.4500e-04   9.3125e-05   21.99   0.14   0.13   0.28 
  2048   3.8840e-04   9.7100e-05   42.18   0.12   0.10   0.28 
  4096   2.3380e-04   1.1690e-04   70.08   0.09   0.07   0.27 
  8192   1.5760e-04   1.5760e-04   103.96   0.11   0.11   0.27 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  8   0   1   2   3   7 
  16   0   1   2   3   7 
  32   0   1   2   3   7 
  64   1   0   3   2   7 
  128   0   1   2   3   7 
  256   0   1   3   2   10 
  512   0   1   3   2   7 
  1024   0   2   1   10   9 
  2048   0   1   3   2   7 
  4096   1   3   10   0   7 
  8192   0   3   1   6   8 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  8    1   2   4 
  16    2   2   4 
  32    1   1   4 
  64    1   2   4 
  128    2   2   4 
  256    1   2   4 
  512    1   4   9 
  1024    3   3   9 
  2048    1   2   10 
  4096    2   4   10 
  8192    1   2   10 


Protocol Sensitivity Summary for Unidirectional Swap of 8192 Bytes (10 iterations/no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  8   2.1479e-02   2.0975e-05   0.76   0.60   0.68   1.20 
  16   1.0700e-02   2.0899e-05   1.53   0.61   0.68   1.23 
  32   5.3187e-03   2.0776e-05   3.08   0.60   0.66   1.23 
  64   3.3659e-03   2.6296e-05   4.87   0.51   0.49   1.14 
  128   1.9177e-03   2.9963e-05   8.54   0.45   0.43   1.00 
  256   1.1862e-03   3.7069e-05   13.81   0.38   0.37   0.83 
  512   1.4472e-03   9.0449e-05   11.32   0.11   0.10   0.28 
  1024   7.3012e-04   9.1265e-05   22.44   0.11   0.09   0.29 
  2048   3.7764e-04   9.4410e-05   43.39   0.11   0.07   0.26 
  4096   2.0870e-04   1.0435e-04   78.51   0.07   0.04   0.24 
  8192   1.2658e-04   1.2658e-04   129.44   0.07   0.05   0.20 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  8   0   1   2   3   7 
  16   0   1   2   3   7 
  32   0   1   2   3   7 
  64   0   1   2   3   7 
  128   0   1   2   3   7 
  256   0   1   2   3   7 
  512   1   0   2   3   10 
  1024   0   1   2   3   10 
  2048   1   0   2   10   9 
  4096   0   7   10   1   3 
  8192   1   7   0   2   8 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  8    1   2   4 
  16    1   2   4 
  32    1   2   4 
  64    2   2   4 
  128    1   2   4 
  256    1   2   4 
  512    2   4   9 
  1024    2   4   9 
  2048    1   3   10 
  4096    2   7   11 
  8192    1   5   11 


Protocol Sensitivity Summary for Unidirectional Swap of 8192 Bytes (1 iterations with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  8   2.1960e-02   2.1446e-05   0.75   0.39   0.18   1.20 
  16   1.1290e-02   2.2051e-05   1.45   0.35   0.16   1.08 
  32   5.5206e-03   2.1565e-05   2.97   0.38   0.17   1.10 
  64   3.2586e-03   2.5458e-05   5.03   0.36   0.19   1.19 
  128   1.9346e-03   3.0228e-05   8.47   0.27   0.12   0.91 
  256   1.1856e-03   3.7050e-05   13.82   0.23   0.09   0.70 
  512   1.4420e-03   9.0125e-05   11.36   0.10   0.05   0.25 
  1024   7.3280e-04   9.1600e-05   22.36   0.09   0.07   0.24 
  2048   3.9640e-04   9.9100e-05   41.33   0.06   0.05   0.19 
  4096   2.2520e-04   1.1260e-04   72.75   0.08   0.07   0.20 
  8192   1.5620e-04   1.5620e-04   104.89   0.09   0.11   0.18 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  8   0   1   4   2   5 
  16   0   1   2   4   3 
  32   0   1   2   4   3 
  64   0   1   2   4   3 
  128   1   0   4   2   3 
  256   0   1   4   2   3 
  512   0   3   1   4   2 
  1024   0   1   3   2   4 
  2048   0   1   4   5   3 
  4096   2   0   3   1   9 
  8192   2   4   5   3   0 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  8    1   2   6 
  16    2   2   6 
  32    1   2   6 
  64    1   1   6 
  128    2   2   6 
  256    1   3   6 
  512    1   5   11 
  1024    1   5   11 
  2048    1   5   11 
  4096    1   3   11 
  8192    1   3   11 


Protocol Sensitivity Summary for Unidirectional Swap of 8192 Bytes (10 iterations with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  8   2.2135e-02   2.1617e-05   0.74   0.48   0.33   1.21 
  16   1.1125e-02   2.1728e-05   1.47   0.47   0.33   1.19 
  32   5.4632e-03   2.1341e-05   3.00   0.48   0.34   1.17 
  64   3.3620e-03   2.6266e-05   4.87   0.39   0.25   1.11 
  128   1.9226e-03   3.0041e-05   8.52   0.35   0.24   0.98 
  256   1.1876e-03   3.7114e-05   13.80   0.29   0.22   0.75 
  512   1.4864e-03   9.2897e-05   11.02   0.09   0.09   0.25 
  1024   7.3418e-04   9.1772e-05   22.32   0.11   0.10   0.26 
  2048   3.8390e-04   9.5975e-05   42.68   0.07   0.06   0.22 
  4096   2.1254e-04   1.0627e-04   77.09   0.05   0.03   0.20 
  8192   1.2538e-04   1.2538e-04   130.67   0.04   0.04   0.15 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  8   0   1   2   3   4 
  16   0   1   4   2   5 
  32   0   1   3   4   2 
  64   0   1   4   5   3 
  128   0   1   4   2   3 
  256   0   1   4   2   3 
  512   0   1   10   4   2 
  1024   0   1   10   4   2 
  2048   1   0   10   2   4 
  4096   10   4   2   0   7 
  8192   2   0   8   3   9 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  8    1   2   2 
  16    1   2   2 
  32    1   2   2 
  64    1   2   5 
  128    1   2   6 
  256    1   2   6 
  512    2   3   11 
  1024    1   2   10 
  2048    2   3   11 
  4096    3   7   11 
  8192    2   8   11 

DISCUSSION


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:02:39 EDT.
86396 accesses since 1/2/96.