Originally appeared in November 2002 Dell Powers Solutions
By Thomas Naughton; Stephen L. Scott, Ph.D.; Brian Barrett; Jeff Squyres; Andrew Lumsdaine, Ph.D.; Yung-Chin Fang; and Victor Mashayekhi, Ph.D. (November 2002)
Looking Inside the OSCAR Cluster Toolkit
Cluster computing is becoming increasingly practical for high-performance computing (HPC)
research and development. This article describes the 1.3 release of the Open Source Cluster
Application Resources (OSCAR) toolkit and explains how the tools facilitate installation,
configuration, cluster and workload management, and security. A look ahead shows how developers
are working to extend the flexibility and simplicity of future OSCAR releases
As the number of nodes in cluster configurations grows, installation, configuration, and
administration become more challenging. The Open Source Cluster Application Resources (OSCAR)
project was founded to study the challenges of cluster management and to provide a solution.
The result, the OSCAR package, includes the best standards-based tools for cluster installation
and management in one software bundle. OSCAR works on commodity equipment and uses open-source
software.
The OSCAR project began in early 2001 and released the first public version, OSCAR 1.0, in April
of the same year. OSCAR 1.3, the latest version, was released in July 2002 and runs on the Red
Hat® Linux® 7.1 and 7.2 and MandrakeSoft MandrakeTM Linux 8.2 operating systems. Currently, the
only hardware configuration supported by OSCAR is multiple compute nodes networked to a single
master node. As the design of OSCAR progresses, more cluster configurations will be supported.
OSCAR was developed by the OSCAR working group, the first working group established by the Open
Cluster Group (OCG), a collaboration of major research centers and technology companies led by
IBM, Indiana University, Intel, MSC Software, the National Center for Supercomputing Applications
(NCSA) at the University of Illinois, and Oak Ridge National Laboratory (ORNL). Other collaborators
included Bald Guy Software, Dell, Ericsson, Lawrence Livermore National Laboratory (LLNL), Silicon
Graphics, and Veridian. The project is open to new collaborators.
Simplifying cluster administration using OSCAR
The software components, or packages, included in OSCAR provide all the tools required to build
and operate a high-performance computing (HPC) cluster. Four of these components are required
for installation and configuration: System Installation Suite (SIS), the ORNL Cluster Command
and Control (C3) tool suite, Env-Switcher, and the OSCAR Wizard. Other components that enhance
the functionality of the cluster include OSCAR Password Installer and User Management (OPIUM)
and Secure Shell (SSH) configuration tools. OSCAR also includes commonly used HPC packages for
message passing, workload management, and cluster monitoring.
Automating installation and configuration with SIS
SIS is an image-based installer package that helps administrators install and configure cluster
nodes over the network. SIS replaced Linux Utility for Cluster Installation (LUI) in OSCAR 1.2.
Teams from the SystemImager® and IBM® LUI projects collaborated to develop SIS, which comprises
the System Installer, SystemImager, and System Configurator tools.
System Installer creates an image1 of a node's file system locally on the master node. Then
SystemImager propagates the generated image over a network to various nodes. Finally, System
Configurator modifies each propagated image with unique configuration information so that, upon
reboot, a node will appear on the network as a useful working machine.
Administrators use SIS to bootstrap node installations kernel boot, disk partitioning, file system
formatting, and base operating system (OS) installation. They can also use the installation image
to maintain the cluster nodes. Modifying a previously deployed image is as straightforward as
modifying a local file system. An administrator can update the image and then use rsync2 to update
the local file system on the cluster nodes. This method can be used to install and manage an entire
cluster.
Administrating single and multiple clusters with C3
The C3 power tools, a product of the scalable systems research at ORNL, offer a command-line
interface (CLI) for cluster system administration. These tools facilitate parallel execution
and let users type a single command that can run on all cluster nodes concurrently, such as
cexec hostname. These tools also allow scatter/gather operations so that file distribution and
collection can occur across all cluster nodes; for example, cpush myfile.txt and cget mydat.log.
C3 version 3.1 includes 10 basic building blocks used by OSCAR for operations such as OPIUM's sync_users,
which synchronizes system files across the cluster. The C3 tools were completely rewritten for
version 3.x and offer many enhancements, including multicluster support—allowing command-line
operations to execute across multiple clusters simultaneously—and improved documentation.
Managing the environment with Env-Switcher
Created for OSCAR by members from Indiana University, the Env-Switcher package provides a simple
CLI through which users can safely manipulate their environments. For example, when a user wants
to add a package, Env-Switcher, not the user, modifies the environment variables, such as PATH
and MANPATH, as necessary. Using Env-Switcher, administrators can set up persistent system- and
user-level attributes for various packages without needing to manually edit user dot files, such
as .bashrc.
The OSCAR installation uses Env-Switcher to set system defaults, which can later be overridden by
individual users. The canonical example is the selection of a default Message Passing Interface
(MPI) implementation. At install time, an administrator can select a default, such as Local Area
Multicomputer (LAM)/MPI or MPICH, which allows OSCAR to set up the correct environment based upon
the install time choice. Based on this selection, OSCAR can set up the correct system and user
environments. Additionally, Env-Switcher enables non-interactive shells such as rsh and ssh to
receive appropriate environment settings, so administrators do not need to edit user-level shell
configuration files.
Installing a cluster using the OSCAR Wizard
The OSCAR Wizard provides a graphical user interface (GUI) to assist with cluster installation.
It comprises a set of screens that guides a user through the process of creating an image, defining
the number of nodes, configuring the network settings, and confirming the cluster setup was
successful. Another set of screens lets users add and remove nodes from the cluster.
Current development intends to create a CLI, which future wizards will use to drive the installation.
This CLI will improve scripting capabilities and offer another option to system administrators who
do not wish to use the GUI-based wizard. This CLI will provide the necessary access to the OSCAR
Data Repository (ODR) while performing cluster configuration and installation tasks.
Managing cluster accounts using OPIUM
The OPIUM package provides a mechanism to synchronize cluster account information. The standard
user management tools (user[add|del], group[add|del]) are made cluster-aware, meaning that an
administrator creates an account as usual, but upon command completion, the account exists on
all OPIUM-managed cluster(s). The OPIUM package also configures the system so that user accounts
create SSH keys upon first login to the system. Having the keys in place lets users access cluster
nodes without a password prompt.
Securing the cluster with ssh
The ssh shell provides a secure replacement for rsh. Added security increases configuration
requirements, burdening system administrators and cluster users who wish to use ssh on their
clusters. OSCAR installs OpenSSH and automatically sets up all the required configuration files
for ssh, transparently replacing rsh with ssh.
Enabling message passing and monitoring
OSCAR includes several packages that are commonly used on HPC clusters to enable message passing
and other cluster operations, such as the following:
- Parallel message passing libraries: LAM/MPI, MPICH, and Parallel Virtual Machine (PVM)
- Batch queuing system: Open Source Portable Batch System (OpenPBS)
- Scheduler:Maui
- Monitoring system: Ganglia (CLI and Web-based view of cluster statistics)
- Security: pfilter (packet filter configured with reasonable settings)
Installing OSCAR and running OSCAR Wizard
The OSCAR Wizard lets both novice users and seasoned system administrators build and configure an
HPC cluster in a few easy steps. Before using OSCAR, the user must complete the following on the
master node (the server from which the image will be deployed to the client nodes):
- Install an OS using standard methods for an OSCAR-supported distribution (for example, with the Red Hat CD-ROM).
- Build the server with X Window System support.
- Make sure the networking is configured and working for the server.
- Provide the internal cluster interface to the installation script (such as ./install cluster eth1).
- Copy the RPMTM (Red Hat Package Manager) files from the distribution CD-ROM(s) into a
directory on the server; the default location is /tftpboot/rpm.
After setting up the server, the user downloads OSCAR from the project Web page and extracts it.
Then the user runs the installation script. This script copies necessary files to the server, sets
up requisite services, and starts the OSCAR Wizard.
Figure 1 outlines the steps that the OSCAR Wizard performs. The steps change slightly among
versions, but the basic process remains the same:
Figure 1. Installing a cluster with the OSCAR Wizard
- Prepare the server, possibly selecting defaults (such as default MPI implementation).
- Create an image for the client nodes from a user-specified description, based on a package list and disk partitioning scheme.
- Define a set of clients: number of nodes in cluster, naming scheme, networking settings (IP, netmask, and so forth), and image name to be installed on clients.
- Collect Media Access Control (MAC) addresses (the identifier used to associate a node with its IP address) for the client nodes.
- Install the compute nodes. First, configure the server to respond to Dynamic Host Configuration Protocol (DHCP) requests from the client nodes. Then, boot the nodes using either a bootable floppy disk or a network-enabled BIOS boot mechanism such as Preboot Execution Environment (PXE). The nodes will contact the server, which will convey their identities. They will then perform the basic operations to complete the installation, such as partitioning the hard drives, formatting the file systems, and copying the files from the server.
- Configure remaining packages and synchronize time.
- Run the test suite to verify that key cluster components and services, such as OpenPBS, PVM, LAM/MPI, MPICH, and Network File System (NFS), are operating properly.
Constructing a cluster is typically an ongoing task. As of OSCAR 1.3, the GUI lets users add and
delete nodes just as they did when first installing the cluster. Users define a set of nodes and
associate an image with it using the MAC-to-IP address mapping. This identity is then transferred
to the client nodes as they boot from either a floppy or network boot mechanism. The process uses
the same screens, and users can rerun the tests to verify the installation updates.
Removing a node is slightly simpler and requires notifying all relevant packages that their
configurations must reflect the node deletion. The wizard modifies configuration files for tools
such as C3 or OpenPBS, which maintain state information about the cluster nodes and therefore must
be updated when nodes no longer exist.
In general, the OSCAR Wizard reduces the time required to build a functional cluster, increases
consistency among cluster builds, and reduces the expertise necessary to build an HPC cluster.
Improving OSCAR through current development efforts
OSCAR offers a solid means for building and configuring a cluster. Current development efforts focus
on increasing installation flexibility and extending cluster management capabilities after deployment.
The following discussion highlights some of these features and goals.
Adding a standard interface for cluster management
Although SIS and C3 are powerful tools, administrators often need better high-level management tools.
The OSCAR group seeks to remedy this issue in the form of a standard interface to a set of tools for
node addition, deletion, and package management. The interface will mask the underlying mechanism,
such as SIS or SIS plus C3, thus allowing others to extend or replace the management system.
Making the OSCAR architecture modular
As OSCAR has evolved, a clear burden has been the integration process required for a major release.
Removing the tight coupling of all packages contained in an OSCAR release has eased integration;
OSCAR 1.3 features a modular packaging system prototype that removes much of this coupling. Developers
plan to extend this decoupling to the OS installation as well. A modular architecture will allow
administrators to install a set of nodes using other means (such as CD-ROM or Red Hat Kickstart
Configurator) and then use OSCAR for the remaining installation and configuration.
Not only does the modular packaging system remove the tight coupling between OSCAR packages, it
also and possibly more importantly enables developers outside of the OSCAR team to contribute packages,
extending the base components of OSCAR. The interface to the modular architecture requires a standard
package, which is currently the widely used RPM system from Red Hat. In addition to using RPM, the
package creator may provide a set of scripts to perform configuration steps not possible within the
RPM framework. As work progresses, the architecture document on the OSCAR Web site will include an
application programming interface (API) for package maintainers.
Accessing reliable information through the ODR
Current OSCAR designs offer very limited data about the cluster to packages or to system administrators.
The OSCAR Data Repository (ODR) is a generic interface to such data, which will be especially important
as the flexibility of OSCAR grows. The ODR API will likely resemble a SQL interface. Access to the data
will be available from any node in the cluster.
The API to the repository will be coupled with the improvements to the standard OSCAR Wizard. The
modular packaging system will also allow scripts to query the ODR at specified times to obtain
information such as number of nodes and master node IP address. This functionality will enable
many cluster-aware packages to configure themselves, reducing the load on cluster system administrators.
Improving the OSCAR Wizard
The OSCAR GUI and companion CLI facilitate better usability through an intuitive, extensible interface.
Administrators usually prefer to use CLIs for expediting complex commands and for creating new
functionality. The GUI offers greater ease of use. The GUI and CLI enable administrators to function
easily without regard for the underlying implementation. Offering both kinds of interface suits
every administration style.
Adding an underlying CLI lets developers contribute different GUIs without overhauling the entire
system. GUIs considered for OSCAR include Perl/Tk, Webmin, and Python/Tkinter. In the future, other
developers might contribute an ncurses-based GUI that also uses the CLI—the underlying CLI would
remain the same for accesses to the ODR.
Looking ahead to OSCAR 2.0
The OSCAR project has emerged as a useful tool for cluster installation and administration. The
introduction of SIS into OSCAR at version 1.2 greatly simplified the steps necessary to build and
configure a cluster. Subsequent releases have brought other features such as a modular packaging
system, support for multiple distributions, and support for Intel® 32-bit and 64-bit architectures.
As the OSCAR project grows, developers seek to balance flexibility and simplicity. OSCAR 2.0 will
offer improved cluster management, a modular architecture, enhanced ODR, and extended GUI and CLI
Wizard tools.
As development progresses, the project will begin to extend, and even define, "best cluster practices."
These extensions will lead to improved cluster management, at both the node and package levels. In the
meantime, the OSCAR project continues to be an effective cluster computing solution, providing powerful
tools for cluster installation and management.
References
Becker, Donald J., Thomas Sterling, Daniel Savarese, John E. Darband, Udaya A. Ranawak, and Charles V.
Packer. "BEOWULF: A parallel workstation for scientific computation." Proceedings of the 24th
International Conference on Parallel Processing, Volume I. Boca Raton, Fla.: CRC Press, 1995.
Fang, Yung-Chin; Tau Leng, Ph.D.; Victor Mashayekhi, Ph.D.; and Reza Rooholamini, Ph.D. "OSCAR 1.1:
A Cluster Computing Update." Dell Power Solutions, Issue 4, 2001.
Hsieh, Jenwei, Tau Leng, and Yung-Chin Fang. "OSCAR: A Turnkey Solution for Cluster Computing." Dell
Power Solutions, Issue 1, 2001.
IBM. LUI Project: Summary. http://oss.software.ibm.com/developerworks/projects/lui.
Luethke, Brian, Thomas Naughton, and Stephen L. Scott. "C3 Power Tools: The Next Generation..." Paper
to be presented at the Austrian-Hungarian Workshop on Distributed and Parallel Systems (DAPSYS 2002),
Linz, Austria, September-October 2002.
Naughton, Thomas, Stephen L. Scott, Brian Barrett, Jeff Squyres, Andrew Lumsdaine, and Yung-Chin Fang.
"The Penguin in the Pail-OSCAR Cluster Installation Tool." Paper presented at the World Multiconference
on Systemics, Cybernetics and Informatics (SCI 2002), Orlando, Fla., July 2002.
Oak Ridge National Laboratory. Project C3: Cluster Command & Control (C3) home page. http://www.csm.
ornl.gov/torc/C3.
"System Installation Suite Project." http://sisuite.sourceforge.net.
Tridgell, Andrew, and Paul Mackerras. "The rsync algorithm." Technical Report TR-CS-96 05. Canberra:
Australian National University, Department of Computer Science, June 1996. See also: http://rsync.
samba.org.
Acknowledgments
Work by Thomas Naughton and Stephen L. Scott was supported by the U.S. Department of Energy. Work by
Brian Barrett was supported by a Department of Energy High Performance Computer Science Fellowship.
Work by Jeff Squyres and Andrew Lumsdaine was supported by a grant from the Lilly Endowment.
Thomas Naughton (naughtont@ornl.gov) is a research associate in the Computer Science and
Mathematics Division, Oak Ridge National Laboratory. Thomas has a B.S. in Computer Science and
a B.A. in Philosophy from the University of Tennessee-Martin, and an M.S. in Computer Science from
Middle Tennessee State University.
Stephen L. Scott, Ph.D. (scottsl@ornl.gov) is a research scientist in the Computer Science
and Mathematics Division, Oak Ridge National Laboratory. Stephen leads the cluster computing effort
at ORNL. He has a B.A. from Thiel College in Greenville, Penn., and an M.S. and Ph.D. from Kent State
University in Kent, Ohio.
Brian Barrett (brbarret@osl.iu.edu) is a graduate student at the Open Systems Laboratory,
Indiana University. Brian has a B.S. in Computer Science from the University of Notre Dame.
Jeff Squyres (jsquyres@osl.iu.edu) is a research associate at the Open Systems Laboratory,
Indiana University. Jeff has a B.A. in English, a B.S. in Computer Engineering, and an M.S. in Computer
Science and Engineering from the University of Notre Dame.
Andrew Lumsdaine, Ph.D. (lums@osl.iu.edu) is the associate director of the Open Systems
Laboratory at Indiana University. Andrew has a Ph.D. from the Massachusetts Institute of Technology.
Yung-Chin Fang (yung-chin_fang@dell.com) is a member of the Scalable Systems Group at Dell.
Yung-Chin has a bachelor's degree in Computer Science from Tamkang University and a master's degree
in Computer Science from Utah State University. He is currently working on his doctorate degree.
Victor Mashayekhi, Ph.D. (victor_mashayekhi@dell.com) is a senior technical member of the
Enterprise Computing Solutions Group at Dell. His product development responsibilities at Dell
have included all the cluster product offerings from Dell. Victor has a B.A., M.S., and Ph.D.
in Computer Science from the University of Minnesota.
For more information
The Open Cluster Group: http://www.openclustergroup.org
The OSCAR Project: http://oscar.sourceforge.net
Previous articles on OSCAR: http://www.dell.com/powersolutions
1 Here, image is defined as a directory tree that comprises an entire file system for a machine.
2 rsync is a tool to transfer files and is similar to rcp and scp.