|Date and Author(s)|
This document contains a brief description of how we presently build, configure, and administer the CSMD clusters at ORNL using OSCAR and the C3 tools.
The ORNL / OSCAR environment for cluster computing is rather straightforward since it follows a “best practices” approach for commodity cluster computing. The target cluster for this environment is the “sweet spot” in cluster computing. Our group defines that as the largest cluster that can sit behind one (1) commodity switch. In the initial phase, early 2000 time frame, this was considered to be a 64-node computation cluster with one (1) additional node as the cluster head. For v2.0, the “sweet spot” is considered to be a 128-node compute cluster with one (1) head node. While by definition the default installation scheme supports only this simple cluster architecture, most tools included in OSCAR can go beyond this scope as the number of nodes is in no way physically limited to a specific node count or network structure. Furthermore, while OSCAR is used to install individual clusters, these clusters may be logically combined into federated clusters (groups of clusters within one administrative domain) using the included Cluster Command & Control (C3) cluster power tools. The design path for OSCAR v2.x includes the ability to have multiple application/tool server nodes with one OSCAR head node (server) per physical cluster.
Through OSCAR v1.1 – Linux Utility for cluster Installation (LUI) is used for the initial system installation. LUI is a “resource based” installation tool in that it provides tools to manage installation resources on the server that can then be allocated and applied (installed on) to clients. LIU initiates a remote installation on clients using the specified resources. Installation resources include the Linux kernel and associated system map, the disk partition table, RPMs, and local and remote (NFS) file systems. Both BOOTP protocol for diskette based client installation, as well as true network installation, using DHCP and PXE. One significant feature of LUI is that it directly supports hardware and software grouping. These logical groupings provide a layer of abstraction that simplifies the configuration process by permitting software groups to be assigned to hardware groups. Post installation tasks such as the individual configuration of local clients are done following the installation process via post install scripts executed in parallel using the C3 cluster power tools.
As of OSCAR v1.2, LUI has been replaced and the initial system builds are done via SIS - System Installation Suite, a combination of SystemInstaller, SystemImager and SystemConfigurator designed to work together to automate the installation and configuration of networked workstations. The process is to first build a default image (golden image) of the node configuration and then place this image on all target nodes. This differs from the LUI approach in that all build activities for SIS take place on the golden image node whereas LUI distributes the build process to each of the target nodes. SIS then uses SystemImager to move the golden image to each of the target nodes. As in the earlier versions of OSCAR, post installation tasks such as the individual configuration of local clients is done following the image distribution process via post install scripts executed in parallel using the C3 cluster power tools. SIS further differs from LUI in that it does not presently support hardware and software resource grouping, as did LUI.
At ORNL, the initial base installation for compute nodes is done via OSCAR and the installed compute node environment image is saved as the base “golden client” in the cluster head node image repository. Other derivations from this base installation are then created as needed. As new environment versions evolve as the new ORNL base environment, a new “golden client” image is saved to the image server. Older versions of the golden image are retained to enable the administrator to “roll back” to an “earlier” golden version of a system’s image. Incremental updates will be discussed in the administration and maintenance section below.
Administration and maintenance tasks:
There are potentially three modes of operation on an ORNL / OSCAR cluster, the first and second are distant extremes in philosophy while the third combines the “best practices” of the first two techniques.
1. Image only based via SIS. One can simply update the golden client as they would any cluster node and replicate this client as needed. However, this model does not directly address the issue of updating cluster wide configuration files held on either a server or on the compute nodes. This is particularly difficult when the configuration files differ between nodes. Thus distributed, remote activities for configuration may be necessary on individual cluster nodes. This method works best when there are no differences between client nodes and there are a limited number of different client node classes, thus reducing the number of golden images supported. Thus, it works best in a relatively static environment.
2. All distributed via C3. Once the initial operating system installation is complete, C3 may be used to orchestrate the machines to build themselves into a computing cluster. However, if a golden image is not saved once changes are made, the build process will have to be duplicated for new nodes coming online. This technique works best when there are numerous small changes to client nodes and these changes may be saved as C3 scripts that perform the changes. Thus, it works best in a relatively dynamic environment.
3. Hybrid – combination of above 1 & 2. This is the mode of operation used at ORNL. As mentioned earlier, OSCAR is used to perform the initial build and the golden image is retained as a backup. Intermediate changes are performed using C3 scripts. A repository of golden images is maintained on the cluster head node as well as any necessary incremental change scripts. As compute nodes evolve into either a new class of operation or as a new operating environment, this newer version of the golden image is extracted via SystemImager and stored in the image repository. This provides versioning at both the image level as well as the scripts that are used for incremental changes. With this technique a cluster’s operating environment may be easily rolled forward and back to support a variety of operating environment configurations. This technique works well for both static and dynamic environments since it extracts the best of both the image and distributed operation techniques outlined above.
Node add & delete tasks:
There is no free lunch yet. While the existing tools of SIS and C3 easily enable one to push the existing golden client and associated modifications to a new client node, it does not provide a mechanism for automatically updating all other clients, servers, or application specific resources with knowledge of the newly added or removed node. While the C3 tools greatly simplify this task, these configuration changes must still be initiated for each application as deemed necessary by that application. Ideally, each application should come with configuration scripts that will reconfigure the cluster as needed. While there are plans to incorporate this facility in later v1.x releases of OSCAR building to the 2.0 release, it will still require buy-in from individual package creators, as they will have to provide this functionality within their software’s installation package.
OSCAR – Open Source Cluster Application Resources: http://oscar.sourceforge.net/
C3 – Cluster Command & Control power tools: http://www.csm.ornl.gov/torc/C3/
LUI – Linux Utility for cluster Installation: http://oss.software.ibm.com/developerworks/projects/lui
SIS – System Installation Suite http://sisuite.sourceforge.net
System Configurator: http://systemconfig.sourceforge.net/
System Installer: http://systeminstaller.sourceforge.net/
System Imager: http://systemimager.org