Originally appeared in November 2002 Dell Powers Solutions
By Reza Rooholamini
High-Performance Computing Clusters: Toward Industry Standardization
Reza Rooholamini, director of operating systems and cluster engineering in the DellTM Enterprise Systems Group, discusses the further evolution of high-performance computing clusters (HPCC) and the integration of new technologies in cluster environments.
The practice of clustering servers and other standard components to create a high-performance computing (HPC) environment continues to gain wider acceptance. High-performance computing cluster (HPCC) development has followed the product stages of other technology components: what began as proprietary, big iron systems has evolved to components that use standards-based technologies. Now, the DellTM High Performance Computing Cluster bundled solution moves entire configurations of standards-based clusters ever closer to becoming standardized items.
Reza Rooholamini, director of operating systems and cluster engineering in the Dell Enterprise Systems Group, recently updated Power Solutions editors on Dell's progress in standardizing HPCC. Reza also shared his insights on future developments in cluster technology and deployments.
On the product evolution path from proprietary to standard to industry standard, where are Dell supercomputer clusters?
Dell clusters are in the trailing part of the standardization stage. Our goal is to continue pushing toward the industry standardization of HPCC so that it becomes fully understood and widely available, with a very attractive total cost of ownership (TCO).
How does Dell facilitate industry standardization?
To standardize a product or service, a company needs four prime strategies. First, you must continuously improve price/performance. Second, you must capture and grow your market share. Third, you evangelize to push the technology. And finally, you must have the ability to integrate new technology quickly. These four strategies drive the standardization of technology.
Let's begin with the first strategy. How has price/performance of HPCC evolved?
Price is going down, performance is going up, and I believe these trends will continue. Theoretically, the Intel® XeonTM processor technology lets us easily deliver close to 9 gigaflops per node with a two-processor server. In the prior generation, performance was close to 2.8 gigaflops at the same price point. The Intel Xeon processor offers advanced features such as Streaming SIMD (single-instruction, multiple-data) Extensions 2 (SSE2), which provides the ability to execute multiple instructions using the same pipeline. That capability is very welcome in HPCC. As the term high-performance computing implies, applications are still hungry for processing power. We can now offer servers that have four times more processing power per processor than we could with the Intel Pentium® III processor.
In the HPC arena, we see a significant decrease in the cost of deploying a cluster configuration. In 1997 and 1998, the systems and services to achieve 1 gigaflop cost $3,000 or $3,500. Today the cost is almost a third, approximately $1,000. So certainly we have shown dramatic progress in regard to price/performance, and I believe we'll reach a price point under $1,000 in the next year.
What are the current trends in features and functionality of HPCC?
The interconnect, the mechanism that allows cooperation and collaboration between the cluster nodes, has evolved in price and performance. Functionality has improved through cluster management software that now can more intelligently monitor and manage the allocation of jobs to the cluster nodes and handle failure situations.
The storage side of the configuration also has evolved. Two years ago, shared storage was likely a node with a direct attach SCSI device. Today, we can offer highly parallel, highly scalable, and highly available shared secondary storage, and we now see the integration of large storage farms within the HPCC. Finally, we see the creation of more software bundles that will run in cluster environments.
How do cluster architectures give organizations more flexibility?
Clustering allows organizations to refresh technology much more quickly. Previously, if an organization had a proprietary supercomputer or enterprise platform machine, any changes in technology would require an extensive overhaul or complete replacement of the system. In a cluster configuration originally based on Pentium III technology, for example, an organization could augment it with Intel Xeon processor-based nodes without any disruption. Nothing really needs to be thrown out. In addition, high availability and scalability are inherent in the cluster design.
What progress has Dell made on the second strategy: growing market share?
We currently have hundreds of customers and installations, and we see an ever-increasing demand for these clusters. I expect this to increase to a couple thousand installations in one to two years. Dell has become one of the two major players in the Intel architecture-based HPCC space and has helped to grow the market significantly. According to IDC, in Q202, Dell ranked as the leading supplier in the HPC Name Brand RISC and Intel-based clusters, with more than 40 percent market share of the $170 million market.1 This is a very significant achievement.
We see a wide spectrum of opportunity. The configurations can be as small as 8 servers or as large as 2,500 servers or more. Some customers such as academicians and researchers are very hands-on and just want the hardware with the operating system. Other customers such as oil companies want not only the hardware, but also all the elements that go around it: services, support, software version control, and so forth. And we can provide those services for them.
How is the customer base for clusters changing?
We see the boundary blurring between what has been known traditionally as technical computing and commercial computing-and by commercial computing I mean both the type of application and the function of the organization itself. Commercial applications such as data mining could run in an HPCC architecture. Commercial entities in addition to academic research institutions are asking for supercomputer clusters. For example, oil companies use these clusters to perform seismic evaluations, oil reservoir simulations, and other tasks. The financial services industry uses these machines to do portfolio forecasting. The application domain is increasing. Bioinformatics is another area that could use HPCC.
Today, IT organizations are searching for one architecture that will evolve and serve both their data processing and engineering needs. Technical computing environments now demand functionality that traditionally was the domain of commercial applications, and vice versa. For example, we see great demand for integrating our storage area network (SAN) and network attached storage (NAS) products into supercomputer clusters. Highly available, reliable, and scalable storage is not particularly new, but this example illustrates how the technical or the computational segment of the market is requesting technology traditionally used in data processing environments. Similarly, commercial entities are requesting the high availability and scalability that comes with these clusters.
Enterprise requirements such as reliability, availability, and serviceability are becoming requirements for all customers. Cluster architectures provide these features at a reasonable cost, and in some cases, at no cost. If you have a cluster of 50 nodes configured in a standby or hot-shadow configuration, you basically have the high availability and scalability at no additional premiums.
So, we see further blurring of the two segments, a further convergence of the two.
How is the academic market changing?
I'd like to coin the phrase "HPCC for every scientist, researcher, and faculty member." If university professors have $50,000 to $60,000 in funds, they can deploy a very powerful configuration right in their offices. They don't need to rely on a centralized supercomputing center, submit a job, and wait hours and days. And this is clearly what we see happening in that segment of market.
How is Dell fulfilling the third strategy: evangelism?
We are working with several organizations to move the technology forward. To begin, our partnership with the Cornell Theory Center, Microsoft, and Intel helped us launch our HPCC offerings. Dell recently announced that it will invest more money, evangelize the technology, and make sure the intensity continues.
During the past three or four years, we have come a long way in standardizing cluster architectures. We actively participate with research organizations and a few other companies in the Open Source Cluster Application Resources (OSCAR) project. This project drives the injection of best utilities and best practices for deploying and managing cluster configurations.
We are funding four to five universities to work with us to further develop the technology and help us accelerate the movement of supercomputer clusters into different application domains, such as energy services, bioinformatics, automotive, finance, and environmental sciences.
At the World Multiconference on Systemics, Cybernetics and Informatics (SCI 2002), we hosted a one-day session where we gathered scientists, practitioners, and industry experts from commercial and educational institutions for a discussion of HPCC trends, best practices, and experiences in the field. The event was very successful, and we certainly plan to continue driving those kinds of activities.
We are determined to function as evangelists, and in the future I expect we will work with and fund more universities and national labs.
Please tell us about the final aspect of the standardization strategy.
The last component is the integration of new technologies into the HPCC solution and responding to customer demands rapidly. For every new technology—whether it is a server, a processor, or an interconnect—that we feel is appropriate and that is demanded by the customer, we can respond.
The Dell engineering organization can test and benchmark basic technologies and configurations, and integrate them very quickly. A typical HPCC configuration contains many variables. We believe the best way to serve our customers is to generate knowledge and information, so we can jointly build the optimal configuration for their applications. That's our engineering strategy. We don't create technology just for the sake of technology; we create technology solutions to help customers if the customer wants it and if the price is right.
How is industry standardization of HPCC becoming reality?
In the last several months, Dell has put in place bundles of 8-, 16-, 32-, 64-, and 128-node clusters that include the interconnect, software stack, and professional services. We have evolved these bundles very rapidly from a Pentium III-based platform to an Intel Xeon processor-based platform. We have evolved the SCSI interface to SANs and have integrated these technologies into HPCC offerings. We have rapidly responded to customer needs and injected new technologies into the solutions. These are the common turnkey solutions, and our salespeople use half a dozen stock-keeping units (SKUs) to order the full cluster configuration.
These turnkey solutions can include not only the hardware and software, but also professional services. Dell has established partnerships with several companies so that if a customer wants a proof-of-concept or pre-sales consultation, we have a SKU to provide it. If customers want the equipment staged before we ship it to them, a service is available. If they want these service partners to port applications, tune the system, or optimize compilers, we can provide those services.
Our definition of industry standards is a little bit broader than just the hardware and the software; we also include services that go around a solution. Several companies now offer services in the HPCC market, and that's yet another aspect of the transition to industry standards.
Do you expect HPCC to incorporate the new modular computing architecture?
As more and more applications move from a proprietary supercomputer into HPCC, the applications need faster and greater processing capability and storage. The opportunity to gain processing density while having a manageable power and cooling requirement makes the modular blade architecture well suited for clusters and the needs of HPC applications.
The deployment strategy depends on the blade as well. In a SAN configuration, if a blade has storage, we can cluster the blade with the storage and distribute the storage. If blades do not have storage, we can cluster them with the centralized storage.
How would you summarize the current trends in high-performance computing clusters?
We believe that clustering is revolutionizing traditional HPC. We see a mushrooming of ever-growing supercomputer clusters for solving technical and commercial computing problems. And, we see a blurring of the line between commercial and technical computing, as enterprise requirements expand to HPCC.
Dell continues its efforts to create bundled solutions that simplify the design, ordering, and deployment of cluster architectures. There is no need to put anything proprietary in these machines. You can use industry-standard components to build a manageable and scalable high-performance computing cluster.