PSTSWM AlphaSC-500 Point-to-Point Communication Performance

Performance Studies using

PSTSWM


Compaq AlphaServer SC SWAP Performance

(unordered swap of 128KB message using MPI within a node)

(performance measured per processor when all processors in node communicating)

Date/Person: January 26, 2000 / P. Worley
Platform: Compaq AlphaServer SC at Oak Ridge National Laboratory (colt.ccs.ornl.gov):
     16 ES40 4-way SMP nodes (500 MHz Alpha 21264 with 4MB L2 cache)
Environment: Digital UNIX V5.0;   RMS 2.36
Communication Library: MPI
SWAP size: 16384 REAL*8 floating point values each direction
Message size: Largest - 16384 REAL*8 floating point values
Smallest - 16 REAL*8 floating point values
Processors: 0 and 1
2 and 3
Latency Definition:(T1024-T512)/512
Model Error Range:[1,1024]
Results:

unordered simple swap
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 29.86 37.61 93.8%
10 iter. 173.44 36.68 58.0%

unordered swap using nonblocking send
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 190.98 52.81 69.3%
10 iter. 240.01 51.27 45.7%
1 iter. w/overlap 190.59 53.17 43.0%
10 iter. w/overlap 227.56 47.20 49.4%

unordered swap using nonblocking receive
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 271.65 33.04 62.8%
10 iter. 256.97 42.14 53.8%
1 iter. w/overlap 282.48 20.71 71.2%
10 iter. w/overlap 288.20 25.21 67.8%

unordered swap using nonblocking send and receive
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 272.84 42.68 53.0%
10 iter. 239.06 37.25 58.0%
1 iter. w/overlap 277.93 21.42 70.0%
10 iter. w/overlap 281.54 24.56 68.4%

unordered swap using ready send
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 274.61 57.87 49.7%
10 iter. 286.05 57.59 49.9%
1 iter. w/overlap 279.29 20.65 70.9%
10 iter. w/overlap 293.05 24.42 68.5%

unordered swap using nonblocking ready send
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 268.32 57.96 49.9%
10 iter. 282.00 57.96 49.4%
1 iter. w/overlap 278.40 20.93 71.0%
10 iter. w/overlap 290.52 24.31 69.0%

native sendrecv
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 282.18 30.71 65.1%
10 iter. 234.82 41.53 53.6%

unordered swap using nonblocking sync. send
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 215.22 35.28 59.7%
10 iter. 241.00 33.96 61.3%
1 iter. w/overlap 199.77 60.64 38.3%
10 iter. w/overlap 221.98 53.96 44.6%

unordered swap using nonblocking receive with sync. send
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 261.78 36.92 59.4%
10 iter. 250.36 36.81 58.9%
1 iter. w/overlap 277.81 35.90 60.2%
10 iter. w/overlap 290.02 36.92 60.8%

unordered swap using nonblocking sync. send and receive
Data Statistics
bidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
1 iter. 266.89 36.88 59.5%
10 iter. 252.92 37.29 58.5%
1 iter. w/overlap 280.79 33.71 59.6%
10 iter. w/overlap 289.57 32.04 63.6%


Protocol Sensitivity Summary for Bidirectional Swap of 131072 Bytes (1 iterations/no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  128   4.0265e-02   3.9321e-05   6.51   0.23   0.13   0.64 
  256   2.1761e-02   4.2503e-05   12.05   0.29   0.23   0.67 
  512   2.5189e-02   9.8394e-05   10.41   0.07   0.02   0.25 
  1024   1.2738e-02   9.9516e-05   20.58   0.24   0.02   1.79 
  2048   6.1716e-03   9.6431e-05   42.48   0.13   0.07   0.42 
  4096   2.9826e-03   9.3206e-05   87.89   0.73   0.10   6.00 
  8192   1.6346e-03   1.0216e-04   160.37   5.76   0.09   56.68 
  16384   1.2450e-03   1.5563e-04   210.56   2.28   0.11   21.97 
  32768   1.1038e-03   2.7595e-04   237.49   13.14   0.11   130.36 
  65536   9.6080e-04   4.8040e-04   272.84   4.46   0.09   43.35 
  131072   9.2900e-04   9.2900e-04   282.18   3.44   0.10   29.32 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  128   6   2   3   0   7 
  256   3   6   0   2   8 
  512   6   3   2   7   9 
  1024   3   2   6   9   8 
  2048   6   3   2   9   8 
  4096   3   1   6   2   7 
  8192   1   6   2   9   8 
  16384   6   3   2   8   7 
  32768   9   8   4   5   7 
  65536   3   9   2   8   4 
  131072   6   4   2   5   3 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  128    1   1   7 
  256    1   1   7 
  512    3   7   9 
  1024    3   7   9 
  2048    1   4   7 
  4096    1   3   7 
  8192    1   3   8 
  16384    1   3   9 
  32768    1   2   9 
  65536    1   4   7 
  131072    1   3   7 


Protocol Sensitivity Summary for Bidirectional Swap of 131072 Bytes (10 iterations/no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  128   4.4266e-02   4.3228e-05   5.92   0.15   0.07   0.48 
  256   2.5871e-02   5.0529e-05   10.13   0.12   0.04   0.38 
  512   2.5305e-02   9.8848e-05   10.36   0.06   0.01   0.23 
  1024   1.2542e-02   9.7985e-05   20.90   0.07   0.02   0.25 
  2048   6.2032e-03   9.6926e-05   42.26   0.08   0.04   0.28 
  4096   3.0935e-03   9.6671e-05   84.74   0.10   0.05   0.35 
  8192   1.6626e-03   1.0391e-04   157.67   0.09   0.07   0.26 
  16384   1.2345e-03   1.5431e-04   212.35   0.06   0.05   0.22 
  32768   1.1665e-03   2.9162e-04   224.73   0.07   0.06   0.30 
  65536   9.8046e-04   4.9023e-04   267.37   0.13   0.11   0.55 
  131072   9.1644e-04   9.1644e-04   286.05   0.26   0.25   0.92 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  128   7   0   8   9   3 
  256   0   6   8   2   9 
  512   7   3   6   8   9 
  1024   3   6   2   9   8 
  2048   3   2   6   9   8 
  4096   3   6   2   0   9 
  8192   3   6   2   1   8 
  16384   2   6   3   9   1 
  32768   4   5   3   7   2 
  65536   5   4   2   9   8 
  131072   4   5   7   9   8 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  128    2   5   7 
  256    1   7   8 
  512    5   8   10 
  1024    2   7   9 
  2048    1   6   8 
  4096    2   6   8 
  8192    2   4   8 
  16384    2   6   10 
  32768    2   4   9 
  65536    2   3   9 
  131072    1   2   6 


Protocol Sensitivity Summary for Bidirectional Swap of 131072 Bytes (1 iterations with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  128   2.9162e-02   2.8479e-05   8.99   0.41   0.41   1.37 
  256   1.8407e-02   3.5951e-05   14.24   0.30   0.30   1.07 
  512   2.1428e-02   8.3704e-05   12.23   0.12   0.18   0.27 
  1024   1.1013e-02   8.6036e-05   23.80   0.26   0.14   1.68 
  2048   5.8556e-03   9.1494e-05   44.77   0.96   0.08   9.04 
  4096   2.9692e-03   9.2787e-05   88.29   0.39   0.05   3.41 
  8192   1.6306e-03   1.0191e-04   160.77   0.45   0.05   4.10 
  16384   1.2144e-03   1.5180e-04   215.86   0.49   0.08   4.24 
  32768   1.0758e-03   2.6895e-04   243.67   1.67   0.02   16.04 
  65536   9.5940e-04   4.7970e-04   273.24   1.20   0.09   10.50 
  131072   9.2800e-04   9.2800e-04   282.48   11.60   0.02   114.70 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  128   4   2   3   5   6 
  256   3   4   2   5   9 
  512   4   3   2   5   9 
  1024   2   3   4   5   9 
  2048   5   4   2   9   3 
  4096   6   2   4   5   1 
  8192   2   6   1   3   5 
  16384   6   2   9   3   4 
  32768   6   3   5   4   2 
  65536   3   2   4   8   5 
  131072   2   9   4   5   3 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  128    4   4   4 
  256    3   4   4 
  512    3   4   9 
  1024    2   4   8 
  2048    2   5   9 
  4096    1   4   9 
  8192    1   5   9 
  16384    2   5   9 
  32768    5   6   8 
  65536    1   4   7 
  131072    2   7   7 


Protocol Sensitivity Summary for Bidirectional Swap of 131072 Bytes (10 iterations with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  128   3.2167e-02   3.1413e-05   8.15   0.33   0.36   1.02 
  256   1.9650e-02   3.8379e-05   13.34   0.27   0.28   0.91 
  512   2.2665e-02   8.8536e-05   11.57   0.09   0.12   0.19 
  1024   1.1227e-02   8.7709e-05   23.35   0.10   0.12   0.23 
  2048   5.8200e-03   9.0938e-05   45.04   0.08   0.07   0.20 
  4096   3.0877e-03   9.6491e-05   84.90   0.06   0.05   0.16 
  8192   1.6449e-03   1.0281e-04   159.37   0.07   0.08   0.13 
  16384   1.2239e-03   1.5299e-04   214.19   0.05   0.03   0.22 
  32768   1.0873e-03   2.7183e-04   241.09   0.09   0.04   0.42 
  65536   9.3982e-04   4.6991e-04   278.93   0.14   0.06   0.65 
  131072   8.9454e-04   8.9454e-04   293.05   0.21   0.04   0.93 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  128   4   3   2   5   9 
  256   2   4   3   5   9 
  512   4   3   2   5   9 
  1024   4   2   3   5   9 
  2048   4   2   5   3   9 
  4096   6   2   0   3   4 
  8192   6   1   4   2   7 
  16384   2   4   3   5   9 
  32768   5   2   4   8   3 
  65536   5   2   4   3   9 
  131072   4   5   8   9   2 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  128    2   4   5 
  256    3   4   5 
  512    3   4   10 
  1024    2   4   10 
  2048    3   4   10 
  4096    1   6   10 
  8192    1   3   10 
  16384    2   7   10 
  32768    2   6   9 
  65536    1   5   8 
  131072    2   6   7 

DISCUSSION


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:02:11 EDT.
86852 accesses since 1/2/96.