PSTSWM Paragon Communication Protocol Performance

Performance Studies using

PSTSWM


Intel Paragon Protocol Performance

(transpose FFT experiment A1 / O(P) sendrecv transpose algorithm)

Date/Person: May 13, 1998 / P. Worley
Platform: Intel Paragon XP/S 150 MP at Oak Ridge National Laboratory:
     1024 MP nodes (3 50-MHz iPSC/860 processors per node)
Environment: Paragon OSF/1 Release 1.0.4 Server 1.4 R1_4_5
f77/Paragon Paragon Version R5.0.3
Code Version: 6.3
Compilation Options: if77 -O4 -Mnodepchk -Knoieee -Msafealloc
Math Library: KAI
Communication Library: NX
Parallel Algorithm: srtrans
Partition: 1x8, 1x16, or 1x32
Results:

16x1 Processors / Problem T10L16
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  4.5994e-01   0.49   0.56   1.20 
Three Fastest
Protocols
1st2nd3rd
  a6   a2   a1 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  4   7   14 

32x1 Processors / Problem T10L16
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  6.0476e-01   0.44   0.43   1.13 
Three Fastest
Protocols
1st2nd3rd
  a6   a1   a3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  4   7   14 

8x1 Processors / Problem T21L8
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.1297e+00   0.13   0.14   0.32 
Three Fastest
Protocols
1st2nd3rd
  b2   a2   b3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  7   11   22 

16x1 Processors / Problem T21L16
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.2976e+00   0.27   0.30   0.62 
Three Fastest
Protocols
1st2nd3rd
  a2   a6   a3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  8   11   12 

32x1 Processors / Problem T21L32
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.6963e+00   0.66   0.74   1.42 
Three Fastest
Protocols
1st2nd3rd
  b3   b2   b1 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  3   11   11 

8x1 Processors / Problem T42L16
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.0594e+01   0.05   0.04   0.12 
Three Fastest
Protocols
1st2nd3rd
  b1   b2   b0 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  4   17   29 

DISCUSSION

Partitions of the Paragon processor grid were used that match the processor subsets that the parallel algorithms would run on in a two dimensional data decomposition: 1x8, 1x16, and 1x32.

Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:30:08 EDT.