Resource Management and Accounting Notebook - page 90 of 150

First PagePrevious PageNext PageLast PageTable of ContentsSearch

Date and Author(s)

System Monitor replacement design

I am in the process of writing a replacement for the system monitor software that we have current installed. The idea is that this new version will be much more scalable, and has the capability to monitor pieces of information not specific to compute nodes. The system is called "warehouse".

The basic concept of information centers around a "node". A node is fundamentally a grouping of information. Concerning a computer, that node's worth of information will probably correspond to that computer. Information about a node is in a generalized format; each piece of information is one of a few general types (int,float, string at the moment), and has a text name associated with that information. This monitoring system will consist of a hierarchy of warehouse processes, each one of which is keeping the information on its nodes up to date.

Within the warehouse process, each node of information looks like this:

It contains basic information; the node has a name, a state (state is only "can I retreieve its information or not?"), the type of node, and its list of three types of information.

The warehouse system is designed to be implemented as a tree of information passing. Each warehouse in the tree knows what information it is responsible for (some number of node's worth of information). For each node it keeps track of, it has the hostname and port of another warehouse process that gives it that information. It is also responsible for servicing information requests from other warehouses for the information that it carries.

The basic warehouse looks like this:

In the center is the list of nodes that this warehouse keeps track of. The nodes are classified into groups for organizational reasons.

Each warehouse has 1 or more information source threads. They put the information into the node storage areas. Typically, their instructions are to contact another warehouse process and retrieve whatever information is required. An info source can be configured to request all available information about a node, or only request information that exists in the node information block. There also exist specialized information sources that talk to a local library; this is how information is originally put into the system.

The information sinks pull information out of the node storage, and send that information off to other warehouses in the tree. Typically this will be done in response to a request, but ultimately, the sink can be designed to push information to the next warehouse in the tree on a periodic basis without continuing requests.

The XML interface to this system will exist (shortly) as a specialized information sink. It will act as a sink of information, taking its info from the node information storage, but in response to XML requests from elsewhere on the meatball diagram.