Home > PC >
Originally appeared in Aug, 2005
Sending out an SOS: HPCC Rescue Coming
By Christopher Lazou HiPerCom Consultants Ltd.
"I don't know where we are going, but we'll get there quicker if we
get started." -- David Bernholdt-ORNL (SOS9, March 2005).
SOS is the recognized international distress call for help was it's quite
apt for "capability" computing in the 1990s, especially in the
U.S. Come the new century, and thanks to some help from new R&D funds
for high productivity systems, IBM, Sun Microsystems Inc. and Cray Inc.
are working hard to offer a rescue pathway.
The SOS Forum series was founded in 1997 under the initiative of people
interested in High Performance Cluster Computing (HPCC) at the Sandia National
Lab and Oak Ridge National Lab as well as EPFL in Switzerland. (EPLF is
the Swiss cradle for the successful design and implementation of Beowulf
systems). SOS stands for "Sandia, Oak Ridge, Switzerland." In
1997, was the major centers were starting to explore the capacity of communication
systems for building their own HPC clusters. (Note: At this time, Quadrics
and Myrinet did not have commercial products.)
The SOS Forums take place annually in the spring and are open to anyone
interested in discussing new and visionary ideas on HPCC, but the number
of participants is deliberately kept low (not more than 50). The ninth
SOS workshop took place in Davos, Switzerland, last March. For further
details visit the SOS website: htpp://www.eif.ch/sos/
The thrust of the SOS Forum is to foster multi-laboratory, multi-national
collaboration to explore the use of new parallel supercomputer architectures,
such as clusters with commodity-based components, heterogeneous and web
supercomputing etc., and is not focused on any particular system.
The theme of the ninth SOS Forum was Science and Supercomputers. The perceived
wisdom is that "Today science is enabled by supercomputing, but tomorrow
science breakthroughs will be driven by supercomputers." The workshop
explored what is needed to prepare for an age when manipulating huge data
sets and simulating complex physical phenomena is used routinely to predict
and explain new scientific phenomena.
The questions addressed at SOS9 were:
What are the computational characteristics needed to facilitate this transition?
How can the existing and emerging supercomputer architectures be directed
to help science?
Is there a need for new facility models that cater to large science or
is the traditional supercomputer center with thousands of users sufficient
for the future?
What software and programming models are being explored to make it easier
for scientists to utilize the full potential of supercomputers?
The SOS9 Forum was a tour de force of personalities from the U.S. and
Europe discussing world-class activities at their sites and furnishing
some insights on how future HPC products can effectively serve their scientific
community and the needs of science at national level. These sites have
heterogeneous environments and using systems from several major vendors.
Sites such as CSCS, Switzerland, HPCx facility, UK, ORNL in Oak Ridge
are on a development path, which will define capability scientific computing
for at least the next decade. The trend is for setting up partnerships
between centers and computer vendors as well as collaborations with centers
of excellence across national boundaries. A good example is the partnership
between Sandia and Cray developing the $90 million Red Storm system. Bill
Camp's motto is "Use high- volume commodity components almost everywhere,
but when necessary for scalability, performance and reliability use custom
The engineering task was how to deliver 40 Tflops per second peak performance
using 10,000 AMD Opteron chips and a specially designed high bandwidth,
low latency interconnect. Red Storm is already a great success, has since
been made into a Cray product and is marketed as the Cray XT3.
According to Bill Camp, "Red Storm is achieving its promise of being
a highly-balanced and scalable HPC platform with a favorable cost of ownership.
It is setting new high water marks in running key national security and
science applications at Sandia and elsewhere."
In March, CSCS, the national Swiss leadership computer center, bought
a large Cray XT3 system, as the first phase of its procurement cycle. CSCS
has laid plans to team with leading U.S.-based supercomputing sites, the
Pittsburgh Supercomputing Center, Oak Ridge National Laboratory and Sandia
National Laboratories, to fine-tune the software environment and make the
Cray XT3 technology mature for a broad spectrum of scientific production
According to Dr. Marie-Christine Sawley, CSCS CEO, "The Cray XT3
was bought as a highly scalable capability system for very demanding, high-end
computational scientific and engineering research applications. The system
is designed to support a broad range of applications and positions CSCS,
as a leadership-class computing resource supplier for the research community
of Switzerland. It also positions it for attracting highly visible, value
added international collaborations."
Sawley explained in her SOS presentation that their systems prior to the
Cray XT3 include an IBM SP4 system where over 60 percent was used for chemistry
codes and an NEC SX-5 high memory bandwidth vector system, of which 44
percent is used for meteorology/climate applications. The CSCS still offers
services on both the SX- 5 and the SP4 systems; the XT3 represents an extension
of its computing capacities toward true MPP. The phase two procurement
this autumn is looking at providing suitable upgraded computing resources
for the SX-5 user community requiring high memory bandwidth capability
CSCS is working at establishing collaboration, including a visitor program
with centers having Cray XT3 systems, on porting applications, system tuning
and tools. CSCS is offering applications in chemistry, molecular dynamics,
environment, material science and physics, from its core competences and
customer portfolio. Tools, such as performance monitoring, debuggers and
visualization, are also part of the focus or interest.
The keynote by Dr. Thomas Zacharia, Computing and Computational Sciences
(ORNL, Associate Lab Director), titled: "A new way to do science:
Leadership Class Computing at ORNL facilities," typifies what these
centers are likely to be developing into. ORNL was awarded funding by the
DoE to address the opportunities and challenges of Leadership computing.
This involves in part developing and evaluating emerging -- but unproven
-- experimental computer systems. Their brief is to focus on Grand Challenge
scientific applications and computing infrastructure -- driven by applications.
The goal of Leadership systems is to deliver computational capability that
it is at least 100 times greater than what is currently available. It is
acknowledged by funding bodies that Leadership systems are expensive, typically
costing about $100 million a year.
It is now recognized that a focused effort is critical in order to harness
the experimental potential of computing and translate it into breakthroughs
in science. The infrastructure needed consists of capability platforms
with ultra- scale hardware as well as software and libraries to efficiently
exploit them, teams of hardware and software engineers and, most importantly,
funding for seamless access, by research teams of scientists investigating
Grand Challenge problems.
With DoE funding, ORNL recently set up the National Leadership Computing
Facility. In the computing platform area, NLCF is concentrating on developing
and proving several Cray purpose-built architectures, optimized for specific
classes of applications.
NLCF has recently installed 1024 processors, aggregate of 18.5Tflops per
second peak performance Cray X1E -- the largest Cray vector system in the
world. The Cray X1E has proven vector architecture for high performance
and reliability, very powerful processors and very fast interconnection
subsystem. It is scalable, has globally addressable memory with high bandwidth
and offers capability computing for key applications. This system has been
allocated to five high-priority Office of Science applications as follows:
3D studies of stationary accretion shock instabilities in core collapse
supernovae (415,000 processor hours).
Turbulent premix combustion in thin reaction zones (360,000 processor
Full configuration interaction benchmarks for open shell systems (220,000
Computational design of the low-loss accelerating cavity for the ILC (200,000
Advance simulations of plasma micro-turbulence (50,000 processor hours).
Another platform just installed is the 5,212 AMD Opteron processors Cray
XT3 system with aggregate peak performance of 25.1 Tflops per second. It
has extremely low latency, high bandwidth interconnect, efficient scalar
processors and balanced interconnect between processors providing capability
computing. Although the Cray XT3 is new, its architecture is proven as
is based on ASCI Red. It uses the Linux operating system on service processors
and a specially adapted micro- kernel, for optimal performance on compute
processors. According to Zacharia, benchmarks show this system is No. 1
in the world on four of the HPC Challenge tests and No. 3 in the world
on the fifth.
To give a feel of the power of this system, in August 2005, just weeks
after the delivery of the final cabinets of the Cray XT3, researchers at
the National Center for Computational Sciences ran the largest ever simulation
of plasma behavior in a Tokamak, the core of the multinational fusion reactor
The code, AORSA used for ITER, solves Maxwell's equations -- describing
behavior of electric and magnetic fields and interaction with matter --
for hot plasma in Tokamak geometry (i.e., the velocity distribution function
for ions heated by radio frequency waves in Tokamak plasma). The largest
run by ORNL researcher Fred Jaeger utilized 3,072 processors -- roughly
60 percent of the entire Cray XT3. The Cray XT3 run improved total wall
time by more than a factor of three over its IBM P3 system.
The importance of this improved performance cannot be overstated. For
decades, researchers have sought to reproduce the power of the sun, which
is generated by fusion of small atoms under extremely high temperatures
-- millions of degrees Celsius. The U.S., Europe and other nations have
joined forces to develop the multi-billion dollar International Thermonuclear
Experimental Reactor. ITER's donut-shaped reactor uses magnetic fields
to contain a rolling maelstrom of plasma, or gaseous particles, which comprise
the "fuel" for the fusion reaction.
Cost-effective and efficient development and operation of ITER depend
on the ability to understand and control the behavior of this plasma: its
physics and optimal conditions that foster fusion. Harnessing fusion for
future "clean" energy will have worldwide environmental ramifications.
NLCF expects to deploy a 100 Tflops per second Cray XT3 in 2006, followed
by a 250 Tflops future Cray Rainier system in 2007 or 2008. Rainier is
a unified product incorporating vector, scalar and potentially re-configurable
and multi-threaded processors in a tightly connected system. This heterogeneous
architecture offers a single system solution for diverse applications workloads.
The NLCF is built as a world-class facility. It consists of 40,000 square
foot computer room and an 8 Mwatts power supply. It contains additional
classrooms and training area for users, a high ceiling area for visualization
(cave, power-wall, access Grid etc.) and separate laboratory areas for
computer science and network research.
Using high bandwidth connectivity via major science networks, NSF TeraGrid,
Ultranet and "Futurenet," NLCF aims to integrate core capabilities
and deliver computing for "frontiers" science. The program includes
joint work with computer vendors to develop and evaluate next-generation
computer architecture (e.g. Cray systems and IBM Blue Gene/L), create math
and computer science methods to enable use of resources, (e.g., SciDAC,
ISIC), nurture scientific applications partnerships and fund modelling
and simulation expertise. The ultimate goal is to transform scientific
discovery in the fields of biology, climate, fusion, materials, industry
and other governmental agencies through advanced computing.
Instruments for international collaboration are also important for shortening
time to solution and enhancing the potential for scientific breakthroughs.
The ORNL program for Leadership computing includes collaborations with
other large-scale computing centres, e.g. Sandia, PSC and CSCS.
As Zacharia said: "ORNL has a long standing partnership with Sandia
and CSCS on many fronts; collaborations in applications areas, collaborations
in enabling technologies, sharing of best practices in managing and operating
our respective centers and, of course, our historical partnership in the
SOS series of Forums."
The NLCF is primed for active dialogue with academia, industry, laboratories
and other HPC centers. The joint institute for computational sciences is
to be a state-of-the-art distance-learning facility. It aims to provide
incubator suites, joint facility offices, conference facilities and strong
student and post-doctoral programs. It supports educational outreach through
research alliances in math and science programs and industrial outreach
through a computational center for industrial innovation. It also supports
international collaborations in computational sciences by hosting guest
scientists and visiting scholars.
Another speaker, Dr. Paul Durham, CCLRC Daresbury Laboratory, described
capability computing on HPCx, an IBM Power4 based system, used as national
resource for UK research. After giving many examples of scientific results,
he said the project to move user consortia onto capability computing --
defined as needing more than 1,000 processors -- as follows: "Research
done on HPCx is driven by specific scientific goals, set out in the peer
reviewed grant applications. Some users are obtaining excellent results
running on 128 to 256 processors. There may be no scientific case for moving
these into the capability regime. The intention for the HPCx facility was
that resources should only be granted to consortia with true capability
Durham concluded by asking a series of questions. The computational research
community identified many fascinating and important Petascale problems,
but has it achieved enough capability usage at Terascale? What are the
best capability metrics? Do they have to be hardware based? Can capability
science be defined? How many projects can be sustained before the capability
mission gets diluted? Are there enough "capability" users with
Petascale ambitions coming through? Are they in new fields or the usual
suspects? Can we expect new fields for ‘capability' computing to
arise spontaneously, or should we lead them to it?
Michele Parrinello, a professor from computational science ETH Zurich,
gave a keynote presentation titled "The challenges of scientific computing." He
described many interesting scientific results in chemistry and molecular
dynamics. He asked the rhetorical question: Why do simulations? His reply
was to interpret experimental results, replace costly, or impossible experiments,
gain insights and possibly predict new properties (e.g. virtual microscopy).
Another question is whether one can use molecular dynamics to explore
long time scale phenomena? The answer: no, presently. Direct simulation
allows only very short runs of ~10ps for ab-Initio MD and ~10ns for classical
MD. Many relevant phenomena need longer time scale: chemical reactions,
diffusion, nucleation, phase transition, protein folding and so on.
Another presentation, from Professor Andreas Adelmann of the Swiss Paul
Scherrer Institut, described research and "HPC demands in computational
accelerator physics." He briefly presented particle accelerators,
how they are modelled with working examples and elaborated on next-generation
particle accelerators, the High Energy -- LHC, the High Intensity - -Spatial
Neutron Source, the high brilliance light source and their modelling needs.
This was illustrated by several examples, such as Particle In Cell simulations,
using low dimension Vlasov solver, for relativistic electrodynamics, including
collisions and so on.
His conclusion was that HPC hardware needed to consist of a large number
of tightly coupled CPUs with access to low latency high bandwidth memory,
especially for the large 3D n-body problem (in space and time) and for
the fine grid 4(6)D, Vlasov solver. Fast I/O is essential as post processing
is a parallel data mining activity. The software requirements are for efficient
numerical implementations of FFT, MG and AMR, load balancing fault tolerant
systems and algorithms. The Paul Scherrer Institut and, in particular,
the particle accelerator project, headed by Dr. Adelmann, was pivotal for
the participation of PSI inside the Horizon project, culminating in the
recent purchase of the Cray XT3 system by CSCS.
In a panel titled "How we as a community can try to get a richer
and more uniform programming environment across the variety of high-end
platforms?" participants were Thomas Sterling (CACR CALTECH), David
Bernholdt (ORNL), Pierre Kuonen (EIF) and Rolf Riesen (Sandia). Bernholdt
discussed a uniform environment for high user productivity and the rapid
creation of correct and efficient application programs.
He explained the different requirements for applications and algorithms,
namely, high-level specification and low-level control. There is a trade-off
in delivering generality, abstraction and scalability. There are also proposals
to develop "polyglot" programming as described in a talk by Gary
Kumfert (LLNL) at a workshop on high productivity languages and programming
models (May 2004).
The requirements for these endeavors to succeed are "Legacy codes
must be supported; traditional and new programming languages, traditional
and new programming models, must be able to interoperate. Some language
and model constructs are incommensurate, but for most some useful specification
for interoperability can be established. It was suggested that BABEL should
be adopted as the language interoperability vehicle for HPC, as it provides
a unified approach in which all languages are considered as peers. It can
act as the bridge for C, C++, Java, F77, F90, F2003, Python, etc. It is
essential that language interoperability is build into standards. For example,
F2003 provides interoperability with C. When designing and implementing
new languages it is advisable to assume they are to be used in a mixed
Interoperability of programming models, presently need a lot of work in
developing an abstract specification and overcoming practical obstacles
in implementing them.
According to Bernholdt, productivity on diverse architectures is achievable
using abstraction, vertical integration across the software stack and helpful
hardware. Interoperability is also achievable in programming languages
by using BABEL and standards, but this is much harder for programming models.
A uniform programming environment is undesirable, as users need choices,
Computing has experienced exponential growth in the last 30 years and
this is expected to continue. Yet the HPC user community has long been
promised Terascale computing by forecasters carried away with new technology,
but as Durham pointed out, is just about conquering Terascale problems,
so to scale up to Petascale is an enormous task. Now that the industry
is building heterogeneous computers, attempting to match hardware to application
needs (e.g., the cascade approach described in an article by the High-End
Crusader HPCwire, 8-12- 05), the problems for Petascale computing look
more tractable. Only time will tell, whether the user community will be
able to utilize these systems by 2010.
One of the greatest challenges for achieving 2010's target is delivering
infrastructure for sustained performance. Technical challenges include
chip densities and heat dissipation, power consumption and footprint at
the component level as well as the memory wall (bandwidth, latency and
connectivity) for harnessing tens of thousands of CPUs to handle large-scale
simulations. The National Leadership Computing Facilities being set up
at ORNL, at PSC, at CSCS and in the UK, etc., are extending the large-scale
scientific computing frontiers.
Copyright: Christopher Lazou, HiPerCom Consultants, Ltd., UK. August 2005
Full background information on all leading HPC solution providers
[ ] A517 ) Data Direct
[ ] A639 ) Microway
[ ] A351 ) APPRO
[ ] A330 ) Atipa
[ ] A518 ) Cray
[ ] A373 ) Dell
[ ] A332 ) Etnus/TotalView
[ ] A470 ) HP
[ ] A335 ) IBM
[ ] A331 ) Intel
[ ] A337 ) Linux Networx
[ ] A345 ) Myricom
[ ] A341 ) NEC
[ ] A338 ) PathScale
[ ] A343 ) Portland Group
[ ] A346 ) Quadrics
[ ] A339 ) SGI
[ ] A342 ) Sun
[ ] A471 ) Voltaire
For sponsorship information contact: firstname.lastname@example.org
Copyright 1993-2005, HPCwire. All Rights Reserved.
Mirrored with permission.