put logo here
CSMD
people
people
sitemap
search

Preliminary Results Using BioInformatics Benchmark Suite

Evaluation of Early Systems

MPP-BLAST2

The following performance data were collected by Philip F. LoCascio on the AlphaServer SC and IBM SP systems at Oak Ridge National Laboratory during 9/2000. A sample of 50 randomly selected fragments of the Escherichic coli (E. coli) genome was compared against the GRAILEXP and NT databases using MPP-BLAST2 (Basic Local Alignment Search Tool). From these data, an average query time was determined and a "queries-per-day" metric extrapolated.

The data are presented as a function of the number of SMP nodes. For these experiments, it was not always optimal to use all of the processors within an SMP node. Timings were collected when using 1, 2, 3, and 4 processors within a node, and the configuration that achieved the best performance is reported. Note that the "0" node value corresponds to the serial performance, i.e., the performance when using one processor in one SMP node.

We are currently investigating the sources of the performance differences between the IBM and the Compaq systems for these benchmarks. On the IBM system, the databases were stored on a GPFS filesystem. To determine the role of GPFS performance in the IBM results, data were also collected using memory-mapping on the GPFS filesystem and using a UFS filesystem with memory-mapping.

 

The data indicates that the GPFS file system does not perform as well UFS for this application. However, the qualitative difference between the Compaq and IBM results is unchanged when using the UFS performance data, and the GPFS file system performance is not the primary cause for the relatively poor IBM performance.

As mentioned previously, it is not always most efficient to use all of the processors in a node. The following data describe the performance when using 1, 2, 3, or 4 processors per node for each of the systems and databases. From this, we can determine the optimal configuration for a given number of nodes.

 

For the Compaq, 3 processors per node is a good choice up until 8 nodes, after which 2 processors per node is optimal. Beyond 16 nodes, 1 processor per node may be preferred. The reason behind this is not clear, as interprocessor communication should not be significant and the granularity of the tasks is not obviously a limiting factor. The issue is still under investigation.

 

In contrast, on the IBM 4 processors per node is optimal up to 8 nodes. For more than 8 nodes the performance shows no improvement for the GRAILEXP experiment, and only a small improvement for the NT experiment. Beyond 16 nodes, there may something to be gained when using 2 processors per node.

For a different perspective on the performance differences between the IBM and Compaq systems, the above data is replotted on a processor (instead of node) basis. The algorithm is aware only of the number of processors being utilized, so the performance differences when using different numbers of processors per node relect performance characteristics of the system.

 

 

The Compaq system shows significant performance degradation when using multiple processors per node for this application. In contrast, while the IBM also shows a degradation, it is much less severe.

In summary, the IBM and Compaq have somewhat different performance characteristics for these benchmarks. The Compaq performs better when not using all processors in a node. The IBM performance is best when using all of the processors, up to a point. The Compaq also demonstrates superior performance and good scalability up to 16 nodes, while the IBM performance lags behind and scales poorly beyond 8 nodes.

ornl | ccs | csm| disclaimer | search

URL http://www.csm.ornl.gov/evaluation/BIO/index.html
Updated: Monday, 20-Aug-2001 12:43:18 EDT
webmaster