PCRM Performance Studies

Performance Studies using

PCRM


PCRM is a slight modification of the NCAR Column Radiation Model (CRM), and is given a distinct name simply to denote that it differs from the stock distribution available from NCAR, and is likely to lag behind updates to CRM.

CRM is a standalone version of the column radiation model used in the Community Climate Model, and represents one of the major computational tasks in the column physics computations. To quote the CRM documentation,

... the CRM is a physical process model which isolates the energetics of radiative transfer from the rest of the CCM3. The CRM is built from the radiation routines from CCM3, along with a simple text interface for the user to input information needed by the radiation calculation.

Because the physics computations are independent between columns and because the parallel implementation used in CCM/MP-2D does not parallelize individual column calculations, we use (P)CRM to determine single processor performance. However, the parallel implementation can affect the performance of CRM by, for example, changing the number of columns and the memory layout of the column data assigned to a given processor.

To examing the impact of the parallel implementation on the performance of the column radiation calculations, we modified CRM in the following ways.

Up to three different experiments are run.

  1. Index layout sensitivity.
    For a fixed number of columns, the number of longitudes is varied, requiring a corresponding "inverse" variation in the number of latitudes. This represents different domain decompositions, e.g. varying the number of processors used to decompose latitude and longitude for a fixed number of processors. The issue to be examined is the performance sensitivity of computing a small number of columns at a time (plon small) or a large number at a time (plon large). Good vectorization requires plon to be large. Cache-based processor architectures may prefer plon small. Setting plon=1 is equivalent to setting the vertical level as the first index.
  2. Square decomposition scaling.
    The total number of columns is varied, assigning approximately the same number to both longitude and latitude. This corresponds to a "square" domain decomposition as the number of processors is varied.
  3. 1D decomposition scaling.
    The number of latitudes is varied for a fixed number of longitdues. This corresponds to the scaling for a one dimensional (latitude) domain decomposition such as that used in the production version of CCM.
For each experiment, three additional issues may also be investigated.
  1. Compiler option sensitivity.
    A range of compiler options are examined for each experiment, to determine which options are best, and the sensitivity of performance ot the choice of the options. Note that only the standard and aggressive optimizations are examined.
  2. Instruction and data cache state sensitivity.
    Experiments are run with timing beginning immediately, after computing a single column, and after running the whole experiment once without timing. This examines the sensitivity to the "first time" perturbation and other instruction and data caching issues.
  3. Multiple instance sensitivity in a shared memory node.
    The serial code is running on multiple processors simultaneously. This examines the effect on performance of multiple processors contending for memory in a shared memory node.

All results are presented in terms of MFlops/second, where the floating point operations where counted using the Speed Shop tools "ssrun -ideal" and "prof -archinfo". Compiler optimization was set at "-64 -O3". The results for a single 18 level column using the "cloudy day" input provided with CRM are as follows.

Levels floating point operations per column sqrt calls in flop count fdiv calls in flop count
18 987496 6.7% 3.4%

RESULTS:

 
Compaq
AlphaSC-667
 
Cray Research
T3E-900
 
IBM
SP3-200 Winterhawk I
SP3-375 Winterhawk II
 
SGI
Origin2000-250
 
Platform Comparisons

Worley's Performance Studies Page


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 09:59:33 EDT.
6346 accesses since 1/2/96.