PSTSWM Origin2000 Algorithm Comparison

Performance Studies using

PSTSWM


SGI/Cray Research Origin2000 Algorithm Comparison

(transpose LT experiment I-C )

Date/Person: April 22, 1999/ P. Worley
Platform: SGI Origin2000 at Los Alamos National Laboratory (n12):
   128 250-MHz MIPS R10000 processors
Environment: IRIX 6.5
mpt_1.3.0.0
MIPSpro Compilers: Version 7.2.1
setenv MPI_DSM_MUSTRUN 1
Code Version: 6.4.3
Compilation Options: f77 -64 -i4 -O3
Math Library: SCSL
Communication Library: MPI
SHMEM
(assuming distributed FFT)
Number of Timesteps: 12
Results:

Transpose LT (1) (mpi) / Column Major Processor Ordering
Algorithm Comparison
  T21L16    T21L16    T42L16    T42L32    T42L16    T85L32 
  P=2x32     P=4x16     P=8x8     P=2x32     P=4x16     P=8x8  
  optimal algorithm   logtrans  logtrans  swtrans  swtrans  srtrans  swtrans 
  (alltoallv-min)/min     0.098    0.132    0.011    1.019    0.257    0.049 
  (generic-min)/min     0.168    0.164    0.016    0.006    0.021    0.011 

Transpose LT (1) (shmem) / Column Major Processor Ordering
Algorithm Comparison
  T21L16    T21L16    T42L16    T42L32    T42L16    T85L32 
  P=2x32     P=4x16     P=8x8     P=2x32     P=4x16     P=8x8  
  optimal algorithm   swtrans  swtrans  srtrans  srtrans  srtrans  srtrans 
  (generic-min)/min     0.045    0.075    0.046    0.044    0.033    0.019 

Transpose LT (1) (combined) / Column Major Processor Ordering
Communication Library Comparisons
  T21L16    T21L16    T42L16    T42L32    T42L16    T85L32 
  P=2x32     P=4x16     P=8x8     P=2x32     P=4x16     P=8x8  
  optimal library   shmem  shmem  shmem  shmem  shmem  shmem 
  (alltoallv-min)/min     0.201    0.281    0.103    1.188    0.298    0.097 
  (mpi-min)/min     0.094    0.131    0.091    0.084    0.033    0.045 


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 10:12:36 EDT.