| Date/Person: | December 11, 1999 / P. Worley |
| Platform: | IBM SP3 at Oak Ridge National Laboratory (bobcat.ccs.anl.gov):
|
|   |    62 4-way Winterhawk II SMP nodes (375 MHz POWER3 with 8MB L2 cache)
|
| Environment: | AIX 4.3.3;   PSSP 3.1.1 |
| Code Version: | 6.6.4 |
| Make Options: | MACH=sp COMM=mpi PRECISION=8 PERF=n
WORKSPACE=20000000 |
| Compilation Options: | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qstrict |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave -qstrict |
|     or | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto -qhot |
|     or | mpxlf -O4 -qarch=auto
-qtune=auto -qcache=auto |
| Link Options: | -bmaxdata:0x70000000 |
|     or | -bmaxdata:0x70000000 -qsmp=noauto |
| Number of steps: | T42: 241 or 481 |
| T85: 49 or 97 |
| | T170: 49 or 97 |
| Notes: | using PSTSWM Fortran routines for Fourier
transforms and BLAS
|
More Aggressive Optimization
with -qhot, without -qstrict |
with -O4, without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
0.011 |
0.022 |
0.034 |
0.257 |
| T85 |
0.058 |
0.129 |
0.208 |
1.555 |
| T170 |
0.393 |
0.884 |
1.421 |
  |
|
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
0.010 |
0.021 |
0.032 |
0.242 |
| T85 |
0.057 |
0.126 |
0.204 |
1.514 |
| T170 |
0.379 |
0.854 |
1.368 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
386.9 |
372.4 |
364.7 |
257.5 |
| T85 |
416.2 |
375.8 |
349.2 |
249.4 |
| T170 |
388.9 |
346.0 |
323.1 |
  |
|
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
421.6 |
401.3 |
392.9 |
273.4 |
| T85 |
426.6 |
383.5 |
356.2 |
256.1 |
| T170 |
403.9 |
358.3 |
335.5 |
  |
|