COMMTEST SP3-200 Point-to-Point Communication Performance

Performance Studies using

COMMTEST


IBM SP3-200 SWAP Performance

(ordered swap of 2MB message using MPI and US between nodes)

(performance measured per processor when all processors in node communicating)

Date/Person: May 25, 2000 / P. Worley
Platform: IBM SP3 at National Energy Research Scientific Computing Center (Gseaborg)
   256 2-way Winterhawk I SMP nodes (200 MHz POWER3 with 4MB L2 cache, equivalent to RS/6000 Model 260
Environment: AIX 4.3.2; POE 2.4.0.12
Communication Library: MPI over the switch using user space
SWAP size: 262144 REAL*8 floating point values each direction
Message size: Largest - 262144 REAL*8 floating point values
Smallest - 256 REAL*8 floating point values
Processors: 0 and 2
1 and 3
Latency Definition:(T1024-T512)/512
Model Error Range:[1,1024]
Results:

ordered simple swap
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 76.86 53.23 44.9%
1 iter. 77.95 52.32 46.6%
10 iter. 77.70 51.45 45.1%

ordered swap using nonblocking send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 78.07 57.79 44.5%
1 iter. 76.90 58.19 44.8%
10 iter. 75.85 55.13 43.6%
cache inv. w/overlap 77.27 61.78 44.1%
1 iter. w/overlap 76.90 57.42 44.8%
10 iter. w/overlap 77.34 57.51 43.8%

ordered swap using nonblocking receive
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 77.85 57.29 44.5%
1 iter. 76.55 60.39 43.0%
10 iter. 76.60 55.42 43.8%
cache inv. w/overlap 77.97 59.45 44.4%
1 iter. w/overlap 76.98 60.46 43.7%
10 iter. w/overlap 76.70 57.68 43.2%

ordered swap using nonblocking send and receive
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 76.92 59.61 43.9%
1 iter. 76.78 59.46 44.3%
10 iter. 76.95 63.73 41.6%
cache inv. w/overlap 76.86 63.11 43.0%
1 iter. w/overlap 76.39 66.30 42.1%
10 iter. w/overlap 76.72 60.89 42.1%

ordered swap using ready send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 77.61 66.66 22.5%
1 iter. 77.34 64.46 24.2%
10 iter. 77.92 69.04 21.6%
cache inv. w/overlap 77.51 62.68 22.6%
1 iter. w/overlap 77.47 60.30 24.8%
10 iter. w/overlap 77.14 58.07 22.9%

ordered swap using nonblocking ready send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 77.05 72.94 20.6%
1 iter. 77.98 71.90 21.5%
10 iter. 77.43 67.64 21.6%
cache inv. w/overlap 78.26 62.03 25.2%
1 iter. w/overlap 77.75 61.62 24.6%
10 iter. w/overlap 77.47 61.67 21.4%

synchronous
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 77.42 58.86 44.2%
1 iter. 78.04 60.50 44.9%
10 iter. 77.34 59.34 43.1%

ordered swap using nonblocking sync. send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 76.50 134.14 19.4%
1 iter. 76.97 135.18 19.6%
10 iter. 77.58 133.03 21.5%
cache inv. w/overlap 77.36 133.12 19.7%
1 iter. w/overlap 77.48 136.01 20.1%
10 iter. w/overlap 77.57 133.80 17.6%

ordered swap using nonblocking receive with sync. send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 77.00 133.93 19.9%
1 iter. 77.07 136.81 19.6%
10 iter. 76.65 132.75 19.0%
cache inv. w/overlap 77.22 135.97 20.0%
1 iter. w/overlap 78.00 140.45 19.0%
10 iter. w/overlap 76.90 141.32 15.2%

ordered swap using nonblocking sync. send and receive
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 76.88 133.94 19.1%
1 iter. 77.90 132.32 20.5%
10 iter. 76.62 132.51 20.0%
cache inv. w/overlap 77.83 140.61 19.5%
1 iter. w/overlap 77.48 140.40 19.2%
10 iter. w/overlap 76.12 137.72 16.4%

ordered simple swap using sync. send
Data Statistics
unidirectional bandwidth estimated latency model error
(peak MByte/sec) (usec/msg) (max. rel. error)
cache inv. 77.54 131.50 20.5%
1 iter. 76.42 127.64 20.3%
10 iter. 77.51 126.51 18.9%


Protocol Sensitivity Summary for Unidirectional Swap of 2097152 Bytes (cache inv./no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  2048   1.9423e-01   1.8968e-04   21.59   0.36   0.12   0.88 
  4096   1.3972e-01   2.7289e-04   30.02   0.25   0.07   0.63 
  8192   1.1383e-01   4.4464e-04   36.85   0.26   0.32   0.33 
  16384   9.1203e-02   7.1252e-04   45.99   0.17   0.21   0.22 
  32768   7.5219e-02   1.1753e-03   55.76   0.13   0.15   0.19 
  65536   6.7244e-02   2.1014e-03   62.37   0.07   0.09   0.10 
  131072   6.0514e-02   3.7821e-03   69.31   0.03   0.04   0.06 
  262144   5.6857e-02   7.1072e-03   73.77   0.02   0.02   0.04 
  524288   5.5173e-02   1.3793e-02   76.02   0.02   0.02   0.03 
  1048576   5.4456e-02   2.7228e-02   77.02   0.01   0.00   0.01 
  2097152   5.3722e-02   5.3722e-02   78.07   0.01   0.01   0.02 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  2048   0   2   1   3   6 
  4096   0   1   2   3   6 
  8192   4   5   0   10   8 
  16384   4   5   9   0   2 
  32768   4   5   3   0   2 
  65536   4   5   10   1   0 
  131072   5   4   2   6   0 
  262144   5   4   6   0   9 
  524288   4   2   8   1   9 
  1048576   2   5   3   4   9 
  2097152   1   2   4   10   6 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  2048    1   3   7 
  4096    1   4   7 
  8192    1   2   2 
  16384    2   2   11 
  32768    1   2   11 
  65536    2   2   11 
  131072    2   9   11 
  262144    2   11   11 
  524288    1   11   11 
  1048576    10   11   11 
  2097152    5   11   11 


Protocol Sensitivity Summary for Unidirectional Swap of 2097152 Bytes (1 iterations/no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  2048   1.9490e-01   1.9033e-04   21.52   0.36   0.12   0.88 
  4096   1.4133e-01   2.7603e-04   29.68   0.24   0.08   0.63 
  8192   1.1506e-01   4.4945e-04   36.45   0.26   0.31   0.34 
  16384   9.0765e-02   7.0910e-04   46.21   0.18   0.22   0.25 
  32768   7.6002e-02   1.1875e-03   55.19   0.11   0.12   0.15 
  65536   6.7955e-02   2.1236e-03   61.72   0.07   0.07   0.11 
  131072   6.0133e-02   3.7583e-03   69.75   0.04   0.05   0.06 
  262144   5.6910e-02   7.1138e-03   73.70   0.02   0.02   0.04 
  524288   5.5201e-02   1.3800e-02   75.98   0.02   0.02   0.04 
  1048576   5.4302e-02   2.7151e-02   77.24   0.01   0.01   0.03 
  2097152   5.3748e-02   5.3748e-02   78.04   0.01   0.01   0.03 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  2048   0   1   2   6   3 
  4096   0   2   1   6   3 
  8192   4   5   2   7   10 
  16384   4   5   0   10   6 
  32768   4   5   0   7   10 
  65536   5   4   0   8   1 
  131072   4   5   1   10   9 
  262144   4   0   7   5   6 
  524288   4   5   6   8   10 
  1048576   4   0   5   7   8 
  2097152   6   5   0   9   4 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  2048    1   3   7 
  4096    2   5   7 
  8192    2   2   2 
  16384    2   2   11 
  32768    1   2   11 
  65536    2   2   11 
  131072    2   6   11 
  262144    1   11   11 
  524288    4   11   11 
  1048576    6   11   11 
  2097152    5   11   11 


Protocol Sensitivity Summary for Unidirectional Swap of 2097152 Bytes (10 iterations/no overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  2048   1.8932e-01   1.8489e-04   22.15   0.37   0.16   0.90 
  4096   1.3664e-01   2.6687e-04   30.70   0.25   0.09   0.64 
  8192   1.1333e-01   4.4268e-04   37.01   0.26   0.31   0.37 
  16384   8.9069e-02   6.9585e-04   47.09   0.17   0.21   0.23 
  32768   7.4669e-02   1.1667e-03   56.17   0.13   0.15   0.16 
  65536   6.6636e-02   2.0824e-03   62.94   0.07   0.08   0.12 
  131072   5.9627e-02   3.7267e-03   70.34   0.04   0.05   0.07 
  262144   5.6908e-02   7.1135e-03   73.70   0.02   0.02   0.04 
  524288   5.5259e-02   1.3815e-02   75.90   0.02   0.01   0.04 
  1048576   5.4482e-02   2.7241e-02   76.98   0.01   0.00   0.02 
  2097152   5.3826e-02   5.3826e-02   77.92   0.01   0.01   0.03 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  2048   0   2   1   6   3 
  4096   0   2   1   6   3 
  8192   5   4   10   0   2 
  16384   5   4   0   10   7 
  32768   4   5   7   0   1 
  65536   5   4   6   0   9 
  131072   4   5   6   0   1 
  262144   4   5   8   3   7 
  524288   4   5   0   6   9 
  1048576   7   6   0   3   8 
  2097152   4   0   7   10   5 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  2048    1   3   7 
  4096    1   5   7 
  8192    2   2   2 
  16384    2   2   11 
  32768    2   2   11 
  65536    2   2   11 
  131072    2   8   11 
  262144    2   11   11 
  524288    3   11   11 
  1048576    8   11   11 
  2097152    6   11   11 


Protocol Sensitivity Summary for Unidirectional Swap of 2097152 Bytes (cache inv. with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  2048   1.9492e-01   1.9035e-04   21.52   0.36   0.10   0.92 
  4096   1.4080e-01   2.7500e-04   29.79   0.25   0.07   0.64 
  8192   1.1139e-01   4.3511e-04   37.65   0.30   0.36   0.40 
  16384   8.9818e-02   7.0170e-04   46.70   0.19   0.23   0.24 
  32768   7.6154e-02   1.1899e-03   55.08   0.10   0.12   0.16 
  65536   6.6374e-02   2.0742e-03   63.19   0.09   0.11   0.13 
  131072   6.0881e-02   3.8051e-03   68.89   0.03   0.03   0.05 
  262144   5.6811e-02   7.1014e-03   73.83   0.02   0.02   0.04 
  524288   5.5499e-02   1.3875e-02   75.58   0.01   0.01   0.02 
  1048576   5.4358e-02   2.7179e-02   77.16   0.01   0.01   0.03 
  2097152   5.3592e-02   5.3592e-02   78.26   0.01   0.01   0.02 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  2048   0   2   1   6   4 
  4096   0   1   2   4   6 
  8192   4   5   0   6   10 
  16384   4   5   6   7   3 
  32768   5   4   10   6   0 
  65536   4   5   7   10   9 
  131072   4   5   8   2   7 
  262144   5   4   8   7   9 
  524288   5   4   7   8   2 
  1048576   4   9   10   0   3 
  2097152   5   2   9   10   4 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  2048    1   1   7 
  4096    1   5   7 
  8192    1   2   2 
  16384    1   2   11 
  32768    2   2   11 
  65536    1   2   11 
  131072    2   10   11 
  262144    2   11   11 
  524288    6   11   11 
  1048576    7   11   11 
  2097152    5   11   11 


Protocol Sensitivity Summary for Unidirectional Swap of 2097152 Bytes (1 iterations with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  2048   1.9343e-01   1.8889e-04   21.68   0.37   0.11   0.93 
  4096   1.4087e-01   2.7513e-04   29.78   0.24   0.06   0.63 
  8192   1.1304e-01   4.4155e-04   37.11   0.28   0.34   0.36 
  16384   9.0320e-02   7.0562e-04   46.44   0.18   0.21   0.25 
  32768   7.5531e-02   1.1802e-03   55.53   0.11   0.13   0.16 
  65536   6.7300e-02   2.1031e-03   62.32   0.07   0.09   0.11 
  131072   5.9890e-02   3.7432e-03   70.03   0.04   0.05   0.08 
  262144   5.6616e-02   7.0770e-03   74.08   0.03   0.03   0.07 
  524288   5.5122e-02   1.3781e-02   76.09   0.02   0.01   0.03 
  1048576   5.3947e-02   2.6974e-02   77.75   0.01   0.01   0.02 
  2097152   5.3603e-02   5.3603e-02   78.25   0.01   0.01   0.03 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  2048   0   1   2   6   4 
  4096   0   6   2   1   4 
  8192   4   5   0   10   6 
  16384   5   4   0   2   1 
  32768   5   4   0   2   6 
  65536   5   4   6   0   3 
  131072   4   5   0   6   2 
  262144   4   5   8   10   9 
  524288   5   4   1   0   9 
  1048576   5   4   6   2   10 
  2097152   0   8   7   9   4 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  2048    1   1   7 
  4096    2   4   7 
  8192    2   2   2 
  16384    2   2   10 
  32768    2   2   11 
  65536    2   2   11 
  131072    2   6   11 
  262144    2   9   11 
  524288    3   11   11 
  1048576    4   11   11 
  2097152    4   11   11 


Protocol Sensitivity Summary for Unidirectional Swap of 2097152 Bytes (10 iterations with overlap)
Runtime Statistics
Msg Sizemin Secmin Sec/Msgmax MBytes/Sec(mean-min)/min(median-min)/min(max-min)/min
  2048   1.9042e-01   1.8596e-04   22.03   0.36   0.08   0.93 
  4096   1.3739e-01   2.6834e-04   30.53   0.24   0.04   0.62 
  8192   1.0903e-01   4.2589e-04   38.47   0.30   0.36   0.37 
  16384   8.7064e-02   6.8018e-04   48.18   0.19   0.24   0.26 
  32768   7.3075e-02   1.1418e-03   57.40   0.14   0.17   0.19 
  65536   6.6207e-02   2.0690e-03   63.35   0.08   0.09   0.12 
  131072   5.9505e-02   3.7191e-03   70.49   0.05   0.06   0.07 
  262144   5.6963e-02   7.1204e-03   73.63   0.02   0.02   0.03 
  524288   5.5247e-02   1.3812e-02   75.92   0.02   0.02   0.04 
  1048576   5.4520e-02   2.7260e-02   76.93   0.01   0.00   0.06 
  2097152   5.4068e-02   5.4068e-02   77.57   0.01   0.01   0.02 
Five Fastest
Protocols
Msg Size1st2nd3rd4th5th
  2048   0   2   1   4   6 
  4096   0   2   1   4   6 
  8192   5   4   0   10   3 
  16384   4   5   7   8   0 
  32768   4   5   8   0   2 
  65536   4   5   10   6   7 
  131072   5   4   7   1   3 
  262144   5   4   9   2   1 
  524288   5   0   3   9   10 
  1048576   5   8   4   3   10 
  2097152   7   6   5   1   10 
       Number of Proctocols With
Runtimes Within X% of Min
Msg Size1%5%25%
  2048    1   3   7 
  4096    1   7   7 
  8192    2   2   2 
  16384    2   2   10 
  32768    2   2   11 
  65536    2   2   11 
  131072    2   3   11 
  262144    2   11   11 
  524288    2   11   11 
  1048576    6   10   11 
  2097152    8   11   11 

DISCUSSION


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:14:35 EDT.
86693 accesses since 1/2/96.