High Availability OSCAR
Summary
OSCAR (Open Source Cluster Application Resources) is
a snapshot of the best known methods for building, programming, and using
clusters. It consists of a fully integrated and easy to install cluster software
stack designed for high performance cluster computing HA-OSCAR (High Availability Open Source Cluster
Application Resource) is also an open source project that leverages OSCAR to
expand High Performance Computing (HPC) clustering toward non-stop services
through the combination of High Availability and Performance Computing
solutions.
HA-OSCAR’s goal is to enhance a
Beowulf cluster to enable mission-critical and downtime-sensitive HPC
applications. To achieve high availability, component redundancy is used to
eliminate single point of failures, specifically at the head node. HA-OSCAR also
incorporates a self-healing mechanism, failure detection and recovery, automatic
failover and fail-back.
HA-OSCAR’s is presently at release
version 1.1. This release provides
an Active/Hot-Standby mode of high availability for the head node. In this mode, the primary master head
node active and the secondary or standby master, is continually checking that
the primary is up and active via a heartbeat mechanism. Should the secondary master detect that
the primary has become inactive; the secondary assumes the role of the primary
head node until the original primary again becomes available for processing.
This activity fully automated and takes place without the need for human
interaction.