CCM/MP-2D Performance Studies

Performance Studies using

CCM/MP-2D


Serial Complexity of CCM/MP-2D

While the serial complexity of a run of CCM/MP-2D is affected by the evolving solution, for example, by the effect of cloud fraction on the the radiation calculations and wind velocities on the advection of moisture, much of the cost of a day of simulation is fixed from day to day. To estimate the complexity of a typical day of simulation we ran the CCM/MP-2D on an Origin 2000 at Los Alamos National Laboratory using the SGI Speed Shop tools to count floating point operations. The "-ideal" option was used, to count the requested number of floating point operations, and the optimization level was varied to determine the minimum number of operations.

The complexity was measured for two different problem sizes, T42L18, which uses a 128 longitude by 64 latitude grid with 18 vertical levels and a 20 minute timestep. and T170L18, which uses a 512 longitude by 256 latitude grid with 18 levels and a 5 minute timestep.

For T42L18, operation counts were calculated for one day (72 timesteps) and 2 days (144 timesteps) on a single processor, then differenced to estimate the cost for a standard day (without problem initialization).

For T170L18, the counters used to tally floating point operations could not handle the large numbers, even when running on 16 processors and using a separate counter for each processor. In a day of simulation, there are three types of timesteps:

Counts for these three types were measured directly, flushing the counters one timestep before the timestep to be examined, and running experiments that ended immediately before the timestep and immediately after, then differencing. The counts for these three types were weighted appropriately to construct an operation count for an entire day. This approach was also used on the T42L18 problem size and compared to the count computed with a direct measurement. The two approaches gave essentially equivalent results.

grid timestep steps per day floating point operations per day sqrt calls in flop count fdiv calls in flop count
T42L18 128 X 64 X 18 20 minutes 72 59,554,603,237 8.3% 4.2%
T170L18 512 X 256 X 18 5 minutes 288 3,231,429,529,384 7.0% 3.1%

The high percentage of sqrt and fdiv calls makes it more complicated to use these counts to compute meaningful flops per second rates. While the percentages are functions of the SGI compiler and, for example, how the "pow" function is implemented, they are still indicative of an interpretation problem that will occur on any platform for which sqrt and fdiv are significantly slower than a floating point multiply/add.


Patrick H. Worley / ( worleyph@ornl.gov)
Last Modified Monday, 15-Jul-2002 09:58:43 EDT.
5608 accesses since 1/2/96.