| Date/Person: | December 11, 1999 / P. Worley |
| Platform: | IBM SP3 at Oak Ridge National Laboratory (bobcat.ccs.ornl.gov):
|
|   |    62 4-way Winterhawk II SMP nodes (375 MHz POWER3 with 8MB L2 cache)
|
| Environment: | AIX 4.3.3;   PSSP 3.1.1 |
| Code Version: | 6.6.4 |
| Make Options: | MACH=sp COMM=mpi PRECISION=8 PERF=n
WORKSPACE=20000000 MATH=essl |
| Compilation Options: | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qstrict |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave -qstrict |
|     or | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto -qhot |
|     or | mpxlf -O4 -qarch=auto
-qtune=auto -qcache=auto |
| Link Options: | -bmaxdata:0x70000000 |
|     or | -bmaxdata:0x70000000 -qsmp=noauto |
| Number of steps: | T42: 241 or 481 |
| T85: 49 or 97 |
| | T170: 49 or 97 |
| Notes: | using ESSL library routines for Fourier
transforms and BLAS (-lessl or -lessl_r)
|
More Aggressive Optimization
with -qhot, without -qstrict |
with -O4, without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
0.008 |
0.017 |
0.026 |
0.198 |
| T85 |
0.048 |
0.107 |
0.173 |
1.263 |
| T170 |
0.342 |
0.755 |
1.215 |
  |
|
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
0.008 |
0.016 |
0.026 |
0.198 |
| T85 |
0.047 |
0.106 |
0.171 |
1.257 |
| T170 |
0.339 |
0.757 |
1.211 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
530.7 |
495.6 |
478.0 |
333.2 |
| T85 |
509.9 |
454.4 |
420.1 |
306.9 |
| T170 |
446.9 |
405.3 |
377.8 |
  |
|
| Problem | L1 |
L2 |
L3 |
L16 |
| T42 |
532.6 |
502.9 |
481.3 |
334.1 |
| T85 |
516.4 |
458.8 |
425.3 |
308.5 |
| T170 |
451.6 |
404.4 |
379.2 |
  |
|