PSTSWM Paragon Algorithm Comparison

Performance Studies using

PSTSWM


Intel Paragon Algorithm Comparison

(transpose LT experiment II-A1 )

Date/Person: May 13, 1998 / P. Worley
Platform: Intel Paragon XP/S 150 MP at Oak Ridge National Laboratory:
     1024 MP nodes (3 50-MHz iPSC/860 processors per node)
Environment: Paragon OSF/1 Release 1.0.4 Server 1.4 R1_4_5
f77/Paragon Paragon Version R5.0.3
Code Version: 6.3
Compilation Options: if77 -O4 -Mnodepchk -Knoieee -Msafealloc
Math Library: KAI
Communication Library: MPI
NX
Partition: 1x8, 1x16, or 1x32
Number of Timesteps: 12
Results:

Transpose LT (2) (mpi)
Algorithm Comparison
  T42L1     T21L2     T42L2     T85L2     T85L1     T85L4  
  P=32     P=16     P=8     P=32     P=16     P=8  
  optimal algorithm   alltoallv  alltoallv  swtrans  alltoallv  alltoallv  swtrans 
  (alltoallv-min)/min     0.000    0.000    0.028    0.000    0.000    0.166 
  (generic-min)/min     0.610    0.444    0.018    0.152    0.096    0.163 

Transpose LT (2) (nx)
Algorithm Comparison
  T42L1     T21L2     T42L2     T85L2     T85L1     T85L4  
  P=32     P=16     P=8     P=32     P=16     P=8  
  optimal algorithm   logtrans  swtrans  swtrans  swtrans  srtrans  swtrans 
  (generic-min)/min     0.148    0.029    0.004    0.027    0.012    0.006 

Transpose LT (2) (combined)
Communication Library Comparisons
  T42L1     T21L2     T42L2     T85L2     T85L1     T85L4  
  P=32     P=16     P=8     P=32     P=16     P=8  
  optimal library   alltoallv  alltoallv  nx  alltoallv  alltoallv  mpi 
  (alltoallv-min)/min     0.000    0.000    0.032    0.000    0.000    0.166 
  (mpi-min)/min     0.000    0.000    0.003    0.000    0.000    0.000 

DISCUSSION

Partitions of the Paragon processor grid were used that match the processor subsets that the parallel algorithms would run on in a two dimensional data decomposition: 1x8, 1x16, and 1x32.

Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:30:25 EDT.