| Date/Person: | January 10, 2002 / P. Worley |
| Platform: | IBM p690 system at Oak Ridge National Laboratory (cheetah.ccs.ornl.gov):
|
|   |    1 32-way p690 SMP node (1.3 GHz POWER4, 4 8-way Mulitchip Modules)
|
| Environment: | AIX 5.1 |
| Code Version: | 6.8.2 |
| Make Options: | MACH=sp COMM=serial PRECISION=8 PERF=n WORKSPACE=22000000 MATH=essl |
| Compilation Options: | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto |
|     or | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto -qstrict |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave |
|     or | mpxlf_r -O3 -qarch=auto
-qtune=auto -qcache=auto -qsmp=noauto -qnosave -qstrict |
|     or | mpxlf -O3 -qarch=auto
-qtune=auto -qcache=auto -qhot |
|     or | mpxlf -O4 -qarch=auto
-qtune=auto -qcache=auto |
| Link Options: | -bmaxdata:0x70000000 |
|     or | -bmaxdata:0x70000000 -qsmp=noauto |
| Number of steps: | T5, T10, T21, T42: 241 or 481 |
| T85: 49 or 97 |
| | T170: 49 or 97 |
| Notes: | using Fortran routines for Fourier
transforms and BLAS
|
Not Reentrant
with -qstrict |
without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00006 |
0.00010 |
0.00015 |
0.00074 |
| T10 |
0.00020 |
0.00039 |
0.00057 |
0.00317 |
| T21 |
0.00095 |
0.00192 |
0.00290 |
0.01980 |
| T42 |
0.00510 |
0.01086 |
0.01743 |
0.11252 |
| T85 |
0.03496 |
0.07352 |
0.11075 |
0.66863 |
| T170 |
0.22273 |
0.47209 |
0.73521 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00005 |
0.00009 |
0.00013 |
0.00063 |
| T10 |
0.00017 |
0.00033 |
0.00048 |
0.00277 |
| T21 |
0.00073 |
0.00148 |
0.00226 |
0.01737 |
| T42 |
0.00377 |
0.00833 |
0.01365 |
0.09412 |
| T85 |
0.02481 |
0.05388 |
0.08142 |
0.56218 |
| T170 |
0.14875 |
0.34582 |
0.57759 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
568.5 |
620.4 |
639.1 |
675.9 |
| T10 |
751.0 |
764.5 |
774.3 |
747.8 |
| T21 |
805.3 |
795.4 |
792.2 |
618.2 |
| T42 |
792.1 |
744.3 |
695.5 |
574.6 |
| T85 |
680.5 |
647.3 |
644.5 |
569.3 |
| T170 |
675.7 |
637.6 |
614.1 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
652.5 |
716.1 |
742.4 |
792.1 |
| T10 |
892.9 |
910.5 |
924.4 |
855.8 |
| T21 |
1053.5 |
1036.1 |
1016.4 |
704.6 |
| T42 |
1071.2 |
969.7 |
888.3 |
686.9 |
| T85 |
959.0 |
883.2 |
876.6 |
677.2 |
| T170 |
1011.8 |
870.4 |
781.7 |
  |
|
Threadsafe and OpenMP Ready
with -qstrict |
without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00009 |
0.00014 |
0.00018 |
0.00078 |
| T10 |
0.00023 |
0.00042 |
0.00060 |
0.00320 |
| T21 |
0.00097 |
0.00193 |
0.00289 |
0.01958 |
| T42 |
0.00506 |
0.01079 |
0.01699 |
0.10845 |
| T85 |
0.03424 |
0.07180 |
0.10751 |
0.64751 |
| T170 |
0.21787 |
0.45793 |
0.71611 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00008 |
0.00012 |
0.00016 |
0.00064 |
| T10 |
0.00020 |
0.00035 |
0.00050 |
0.00274 |
| T21 |
0.00075 |
0.00149 |
0.00225 |
0.01741 |
| T42 |
0.00376 |
0.00839 |
0.01353 |
0.09455 |
| T85 |
0.02490 |
0.05420 |
0.08212 |
0.56471 |
| T170 |
0.14881 |
0.34592 |
0.57845 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
348.5 |
459.3 |
516.8 |
647.7 |
| T10 |
641.7 |
708.5 |
735.1 |
740.6 |
| T21 |
786.8 |
791.3 |
794.1 |
625.1 |
| T42 |
798.9 |
749.2 |
713.6 |
596.1 |
| T85 |
694.9 |
662.8 |
663.9 |
587.9 |
| T170 |
690.8 |
657.3 |
630.5 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
384.6 |
523.3 |
596.5 |
780.7 |
| T10 |
748.8 |
845.2 |
882.9 |
866.2 |
| T21 |
1021.8 |
1028.0 |
1018.2 |
703.1 |
| T42 |
1073.9 |
962.7 |
895.7 |
683.8 |
| T85 |
955.6 |
878.0 |
869.2 |
674.1 |
| T170 |
1011.4 |
870.2 |
780.6 |
  |
|
More Aggressive Optimization
with -qhot, without -qstrict |
with -O4, without -qstrict |
MEASURED TIME PER TIMESTEP (SEC) |
MEASURED TIME PER TIMESTEP (SEC) |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00005 |
0.00009 |
0.00012 |
0.00062 |
| T10 |
0.00016 |
0.00033 |
0.00048 |
0.00293 |
| T21 |
0.00071 |
0.00151 |
0.00229 |
0.01794 |
| T42 |
0.00370 |
0.00863 |
0.01383 |
0.09717 |
| T85 |
0.02440 |
0.05520 |
0.08223 |
0.59060 |
| T170 |
0.14866 |
0.36379 |
0.58891 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
0.00005 |
0.00009 |
0.00012 |
0.00062 |
| T10 |
0.00016 |
0.00033 |
0.00048 |
0.00292 |
| T21 |
0.00071 |
0.00153 |
0.00228 |
0.01785 |
| T42 |
0.00370 |
0.00876 |
0.01385 |
0.09701 |
| T85 |
0.02445 |
0.05530 |
0.08130 |
0.58845 |
| T170 |
0.14741 |
0.36055 |
0.58855 |
  |
|
MEASURED MFLOP/SEC RATES |
MEASURED MFLOP/SEC RATES |
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
695.1 |
731.4 |
769.6 |
812.2 |
| T10 |
925.4 |
903.1 |
923.3 |
810.5 |
| T21 |
1083.9 |
1011.1 |
1004.1 |
682.3 |
| T42 |
1092.3 |
936.8 |
876.7 |
665.3 |
| T85 |
975.2 |
862.0 |
868.0 |
644.6 |
| T170 |
1012.4 |
827.4 |
766.7 |
  |
|
| Problem |
L1 | L2 | L3 | L16 |
| T5 |
696.7 |
731.9 |
769.3 |
809.1 |
| T10 |
926.5 |
904.4 |
922.6 |
811.3 |
| T21 |
1081.4 |
1001.7 |
1006.9 |
685.8 |
| T42 |
1092.7 |
922.9 |
875.4 |
666.4 |
| T85 |
973.0 |
860.5 |
878.0 |
646.9 |
| T170 |
1021.0 |
834.9 |
767.2 |
  |
|