|
|
|
|
The AORSA-3D code solves for the wave electric field and heating in a 3-D stellerator plasma heated by radio frequency waves using an all orders spectral algorithm. It represents an important kernel in the "Numerical Computation of Wave-Plasma Interactions in Multi-dimensional Systems" SciDAC project.AORSA3D is an MPI code that uses SCALAPACK to solve linear systems arising from the spectral discretization. AORSA3D has three major computational phases:
In porting AORSA3D, the vendor-installed version of SCALAPACK was used, which, as shown later, achieved excellent performance. The code was compiled using "-C vopt" optimization. Attempting to run with "-C hopt" optimization failed.
- matrix generation
- complex linear system solution
- current calculation
AORSA3D is typically run in a scale-up mode, where the number of modes retained by the model is increased as the number of processors is increased, keeping the memory size per processor approximately constant. The following experimental results describe the performance in terms of the ratio of the number of modes to the execution time. The scaling behavior as a function of the number of processors is not important. If N is the number of modes, then the total memory requirement is O(N**2), while the computational complexity contains both O(N**2) and O(N**3) terms. Thus the ratio necessarily decreases for increasing N once the O(N**3) term becomes dominant.
![]()
From these results, AORSA3D is not performing well on the SX-6. The following three graphs compare the runtime of each computational phase on the SX-6 with the performace on the IBM p690. In these plots, larger values signify worse performance.
![]()
![]()
![]()
From these results, the two phases that have not been modified for vectorization and which do not call math library routines do not perform well on the SX-6, performing 3.5X - 4X slower than the p690. In contrast, the linear system solve achieves very high performance on the SX-6, approximately 2.5X faster than on the p690. The following graph describes the performance of the linear system solution in terms of computational rate (so larger values signify better performance).
![]()
Thus the linear system solution is achieving 75% of peak on the SX-6/8.