| Date/Person: | January 10, 2002 / P. Worley |
| Platform: | IBM p690 system at Oak Ridge National Laboratory (cheetah.ccs.ornl.gov):
|
|   |    1 32-way p690 SMP nodes (1.3 GHz POWER4, 4 8-way Mulitchip Modules)
|
| Environment: | AIX 5.1 |
| Code Version: | 6.8.2 |
| Make Options: | MACH=sp COMM=serial PRECISION=8 PERF=n WORKSPACE=22000000 MATH=essl |
| Compilation Options: | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto |
|     or | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto -qstrict |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave -qstrict |
|     or | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto -qhot |
|     or | mpxlf -O4 -qarch=auto
-qtune=auto -qcache=auto |
| Link Options: | -bmaxdata:0x70000000 |
|     or | -bmaxdata:0x70000000 -qsmp=noauto |
| Number of steps: | T5, T10, T21, T42: 241 or 481 |
| T85: 49 or 97 |
| | T170: 49 or 97 |
| Notes: | using ESSL library routines for Fourier
transforms and BLAS
|
Not Reentrant
with -qstrict |
without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00004 |
0.00007 |
0.00010 |
0.00048 |
| T10 |
0.00014 |
0.00027 |
0.00040 |
0.00233 |
| T21 |
0.00070 |
0.00141 |
0.00216 |
0.01554 |
| T42 |
0.00423 |
0.00906 |
0.01414 |
0.08887 |
| T85 |
0.02906 |
0.05995 |
0.08980 |
0.55784 |
| T170 |
0.19702 |
0.41469 |
0.66038 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00003 |
0.00006 |
0.00009 |
0.00041 |
| T10 |
0.00012 |
0.00022 |
0.00032 |
0.00191 |
| T21 |
0.00051 |
0.00105 |
0.00160 |
0.01273 |
| T42 |
0.00293 |
0.00658 |
0.01035 |
0.07010 |
| T85 |
0.01902 |
0.04013 |
0.06054 |
0.45013 |
| T170 |
0.12467 |
0.30319 |
0.49961 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
774.9 |
897.4 |
942.8 |
1043.4 |
| T10 |
1043.7 |
1081.8 |
1102.3 |
1018.9 |
| T21 |
1092.5 |
1081.9 |
1062.4 |
787.6 |
| T42 |
954.8 |
891.7 |
857.6 |
727.4 |
| T85 |
818.8 |
793.8 |
794.8 |
682.4 |
| T170 |
763.9 |
725.9 |
683.7 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
901.3 |
1032.2 |
1101.4 |
1233.5 |
| T10 |
1281.8 |
1329.6 |
1369.2 |
1242.6 |
| T21 |
1493.6 |
1453.2 |
1436.5 |
961.7 |
| T42 |
1380.7 |
1228.7 |
1171.7 |
922.3 |
| T85 |
1250.7 |
1185.7 |
1179.0 |
845.7 |
| T170 |
1207.3 |
992.8 |
903.8 |
  |
|
Threadsafe and OpenMP Ready
with -qstrict |
without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00008 |
0.00011 |
0.00014 |
0.00051 |
| T10 |
0.00018 |
0.00031 |
0.00044 |
0.00239 |
| T21 |
0.00073 |
0.00145 |
0.00218 |
0.01550 |
| T42 |
0.00427 |
0.00916 |
0.01417 |
0.08865 |
| T85 |
0.02858 |
0.05920 |
0.08842 |
0.55001 |
| T170 |
0.19430 |
0.41521 |
0.65016 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00007 |
0.00010 |
0.00012 |
0.00044 |
| T10 |
0.00015 |
0.00026 |
0.00036 |
0.00199 |
| T21 |
0.00055 |
0.00109 |
0.00164 |
0.01284 |
| T42 |
0.00302 |
0.00664 |
0.01043 |
0.06937 |
| T85 |
0.01902 |
0.03999 |
0.06055 |
0.43879 |
| T170 |
0.12304 |
0.29344 |
0.48992 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
408.3 |
584.4 |
692.6 |
977.0 |
| T10 |
836.3 |
958.9 |
1016.5 |
994.4 |
| T21 |
1044.3 |
1052.1 |
1051.0 |
789.5 |
| T42 |
945.9 |
882.5 |
855.7 |
729.3 |
| T85 |
832.4 |
803.8 |
807.2 |
692.1 |
| T170 |
774.6 |
725.0 |
694.5 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
436.2 |
640.6 |
771.4 |
1135.9 |
| T10 |
983.5 |
1156.8 |
1237.7 |
1194.6 |
| T21 |
1395.2 |
1406.5 |
1401.0 |
953.0 |
| T42 |
1340.1 |
1216.3 |
1161.9 |
931.9 |
| T85 |
1251.2 |
1190.0 |
1178.8 |
867.6 |
| T170 |
1223.2 |
1025.8 |
921.6 |
  |
|
More Aggressive Optimization
with -qhot, without -qstrict |
with -O4, without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00004 |
0.00006 |
0.00009 |
0.00043 |
| T10 |
0.00011 |
0.00023 |
0.00034 |
0.00212 |
| T21 |
0.00051 |
0.00111 |
0.00167 |
0.01364 |
| T42 |
0.00295 |
0.00706 |
0.01082 |
0.07368 |
| T85 |
0.01881 |
0.04152 |
0.06132 |
0.47087 |
| T170 |
0.12235 |
0.30675 |
0.50424 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00004 |
0.00006 |
0.00009 |
0.00042 |
| T10 |
0.00011 |
0.00023 |
0.00034 |
0.00212 |
| T21 |
0.00051 |
0.00111 |
0.00165 |
0.01362 |
| T42 |
0.00296 |
0.00701 |
0.01082 |
0.07359 |
| T85 |
0.01883 |
0.04168 |
0.06054 |
0.46985 |
| T170 |
0.12462 |
0.31054 |
0.50793 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
839.3 |
966.9 |
1063.7 |
1178.7 |
| T10 |
1292.8 |
1277.9 |
1324.7 |
1116.8 |
| T21 |
1511.3 |
1372.9 |
1373.5 |
897.3 |
| T42 |
1370.8 |
1144.7 |
1120.5 |
877.5 |
| T85 |
1264.7 |
1146.1 |
1164.0 |
808.5 |
| T170 |
1230.1 |
981.3 |
895.5 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
872.1 |
973.7 |
1060.3 |
1185.7 |
| T10 |
1310.4 |
1278.8 |
1319.1 |
1116.4 |
| T21 |
1502.3 |
1379.0 |
1386.7 |
898.4 |
| T42 |
1365.9 |
1152.3 |
1120.0 |
878.5 |
| T85 |
1263.6 |
1141.6 |
1179.1 |
810.2 |
| T170 |
1207.8 |
969.3 |
889.0 |
  |
|