| home | about us | contact | ||||
![]() |
| |||
| CSM Home | |||||||||||||||||||||||||||||||||
|
PSTSWM on the Cray X1SSP and SMP ExperimentsThese experiments look at performance when using a single SSP instead of an MSP, and when running multiple instances of the job on the same SMP node. On the X1, this means solving 4 instances of the same problem on all 4 MSP processors or 16 instances on all 16 SSP processors in an SMP node. We also include data from running 4 instances of the same problem on all 4 SSP processors that make up a single MSP. The intent is to determine whether memory contention between the different processes can degrade performance. 16 MByte pages are used in all experiments described here.The first set of graphs compare performance when using a single MSP processor, a single SSP processor, 4 MSP processors, 4 SSP processors, and 16 SSP processors when the number of vertical levels is specified at runtime.
From these data, there is no significant per processor performance difference between running with just one MSP and with four MSPs, indicating no performance degradation due to memory contention. In contrast, there is a small performance degradation from using four SSPs instead of one and an even greater difference between using sixteen SSPs instead of one. The performance degradation for the SSP experiments is likely due to contention for access to both Ecache and main memory. The next graph is the per processor performance ratio between experiments using a single MSP and a single SSP, between a single MSP and 4 SSP processors, and between 4 MSP processors and 16 SSP processors for problems with 18 vertical levels. This is meant to be a visual aid in the following discussion comparing the performance of the MSP and SSP experiments. Similar results hold for the other vertical levels, and the earlier graphs can also be used to estimate the ratios.
The single MSP performance is not four times greater than the single SSP performance, typically ranging from 2-3 times greater, so there is some inefficiency in the compiler' attempt to exploit all of the MSP hardware. As the single SSP experiment does not have to share access to the Ecache or the main memory, it is perhaps more reasonable to compare data for the 4 SSP experiments with the 1 MSP experiments. In this comparison, the 1 MSP performance is typically 2-3.5 times better, so is still not as effective as using the SSPs directly in this throughput metric. In contrast, when comparing using 4 MSPs with 16 SSPs, the MSP performance can be as high as 4.7 times as great as the SSP performance, in which case the memory contention caused by assigning separate processes to SSPs overcomes the lack of efficiency in assigning a single process to an MSP. Note that none of these experiments are definitive. Using SSPs directly in a parallel code requires more explicit parallelism, which often comes with its own overhead. Additionally, parallelizing a fixed size problem across SSPs or MSPs will assign smaller subproblems per SSP than MSP. While this decreases memory contention for the SSP runs, it also decreases loop lengths, which likely offsets any gains in memory performance. Overall, our current feeling is that assigning a process to an MSP is likely to be the more efficient approach. The second set of graphs compare performance when using a single MSP processor, a single SSP processor, 4 MSP processors, 4 SSP processors, and 16 SSP processors when the number of vertical levels is specified at compile time. The fourth graph, as above, is the per processor performance ratio between experiments using a single MSP and a single SSP, between a single MSP and 4 SSP processors, and between 4 MSP processors and 16 SSP processors for problems with 18 vertical levels.
|
||||||||||||||||||||||||||||||||
|
ORNL
| Directorate
| CSM
| NCCS
| ORNL Disclaimer
| Search
Staff only: CSM computers | who, what, where? | news |
|||||||||||||||||||||||||||||||||
URL: http://www.csm.ornl.gov/evaluation/PHOENIX/PSTSWM-ssp-smp.CRAYX1.html Updated: Thursday, 19-Jun-2003 13:08:27 EDT webmaster |
|||||||||||||||||||||||||||||||||