PSTSWM Paragon Communication Protocol Performance

Performance Studies using

PSTSWM


Intel Paragon Protocol Performance

(distributed LT experiment A1 / ring-pipeline sum algorithm)

Date/Person: May 13, 1998 / P. Worley
Platform: Intel Paragon XP/S 150 MP at Oak Ridge National Laboratory:
     1024 MP nodes (3 50-MHz iPSC/860 processors per node)
Environment: Paragon OSF/1 Release 1.0.4 Server 1.4 R1_4_5
f77/Paragon Paragon Version R5.0.3
Code Version: 6.3
Compilation Options: if77 -O4 -Mnodepchk -Knoieee -Msafealloc
Math Library: KAI
Communication Library: NX
Parallel Algorithm: ringpipe
Partition: 1x8, 1x16, or 1x32
Results:

1x16 Processors / Problem T21L2
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  2.0408e-01   0.25   0.25   0.64 
Three Fastest
Protocols
1st2nd3rd
  i2   f2   i4 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  2   6   24 

1x32 Processors / Problem T42L1
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  3.3733e-01   0.31   0.30   0.78 
Three Fastest
Protocols
1st2nd3rd
  i2   c2   i3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  1   5   19 

1x8 Processors / Problem T42L2
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.0729e+00   0.13   0.17   0.21 
Three Fastest
Protocols
1st2nd3rd
  i3   i2   f3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  6   14   48 

1x16 Processors / Problem T85L1
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.7726e+00   0.16   0.21   0.26 
Three Fastest
Protocols
1st2nd3rd
  i3   i5   f3 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  3   7   47 

1x32 Processors / Problem T85L2
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  2.1901e+00   0.22   0.27   0.36 
Three Fastest
Protocols
1st2nd3rd
  i3   f3   i2 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  1   4   19 

1x8 Processors / Problem T85L4
Runtime Statistics
min(mean-min)/min(median-min)/min(max-min)/min
  1.2942e+01   0.09   0.13   0.14 
Three Fastest
Protocols
1st2nd3rd
  f3   i3   f2 
       Number of Proctocols With
Runtimes Within X% of Min
1%5%25%
  7   15   48 

DISCUSSION

Partitions of the Paragon processor grid were used that match the processor subsets that the parallel algorithms would run on in a two dimensional data decomposition: 1x8, 1x16, and 1x32.

Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:30:01 EDT.