PSTSWM Paragon Communication Protocol Performance

Performance Studies using

PSTSWM


Intel Paragon Protocol Performance

(transpose LT experiment II-A2 / O(log P) transpose algorithm)

Date/Person: October 21, 1994 / P. Worley
Platform: Intel Paragon at Sandia National Laboratory (acoma):
     1824 GP nodes (2 50-MHz iPSC/860 processors per node)
Environment: SUNMOS 1.6.1
f77/Paragon Paragon Version ???
Code Version: 3.2
Compilation Options: if77 -O4 -Mnodepchk -Knoieee
Math Library: none
Communication Library: SUNMOS
Parallel Algorithm: logtrans
Partition: 1x8, 1x16, or 1x32
Number of Timesteps: 12
Results:

1x16 Processors / Problem T21L2
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  2.3240e-01   0.02   0.02   0.04 
Three Fastest
Protocols
1st2nd3rd
  b6   c2   a5 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  2   24   24 

1x32 Processors / Problem T42L1
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  2.8116e-01   0.02   0.02   0.04 
Three Fastest
Protocols
1st2nd3rd
  b6   c2   b0 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  4   24   24 

1x8 Processors / Problem T42L2
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.5530e+00   0.01   0.01   0.04 
Three Fastest
Protocols
1st2nd3rd
  b6   a4   a5 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  14   24   24 

1x16 Processors / Problem T85L1
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.9862e+00   0.02   0.01   0.04 
Three Fastest
Protocols
1st2nd3rd
  c3   c2   d3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  14   24   24 

1x32 Processors / Problem T85L2
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  2.1449e+00   0.01   0.01   0.04 
Three Fastest
Protocols
1st2nd3rd
  a5   a4   d5 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  13   24   24 

1x8 Processors / Problem T85L4
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.6645e+01   0.01   0.00   0.04 
Three Fastest
Protocols
1st2nd3rd
  c3   c5   c2 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  14   24   24 

DISCUSSION

Unlike the more recent Paragon OSF experiments, partitions of the Paragon processor grid were NOT used that match the processor subsets that the parallel algorithms would run on in a two dimensional data decomposition. Partitions WERE chosen to minimize the impact of other users on the timings.

Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:23:00 EDT.