Node Build and Configuration Notebook - page 21 of 55

First PagePrevious PageNext PageLast PageTable of ContentsSearch

Date and Author(s)

Information Service Abstraction and Proposal

JP and folks have made a good start on a proposal for an Information Service API. The initial discussion led to an agreement that we would think in terms of records rather than key/vaue pairs. The first API proposal was oriented around a Table abstraction to store things in by row and column. The next cut was much better in that the API took a more data oriented viewpoint. I would like to step back and propose we take a top down view of the Info Service and consider the following abstraction and API.

Agreed on facts for the Scalable Systems Suite:
1. Components communicate with each other by sending messages.
2. The content of these messages is XML.
3. The above two allow components to be written in any language.
4. The Info Service is just another component in the Suite.
-- this implies that there can be several different implementations. All they have to do is conform to the API.

Desirable features that have been discussed for Info Service:
1. Could be distributed or centralized -- transparent to componets.
2. Could exist as an abstraction only -- one could ask the service for information and simply be redirected to where to find it. For example from the job management component.
3. Info Service is not required to use components in the Scalable Systems Suite. A computer center could wire components together directly if they so desired.

Top Down View

The Information Service is like a persistent storage vault. It doesn't generate information itself -- it just stores information that other components ask it to. The advantages are that all components in the suite have some place to look for information, if they haven't been wired to look in a particular place, and the individual components don't have to store their information themselves.

Here is the key idea. The Info Service is a component and reacts to the world by receiving and sending messages. The information that comes is in a message. So just have the duty of the Info Service be to store and return messages. This is a very powerful concept. Info service shouldn't have to understand what is inside a message. I would argue that it can't understand what is in the information it stores. At most it can only understand the schema of the information. It doesn't know if nodes is CPUs, boxes, or processes. Given this, the data being stored doesn't have be XML. For example, for efficiency, a component may want to stuff raw data onto the info service for its own personal use later, like an mpd daemon creating a backup of its state. But for the most part it is expected that the stored messages are XML blobs that are then just sent as a message to any requesting component.

Just as in the present API proposal the XML blob would need to have some schema version in its header that the requesting component can check and then decode according to documentation published out-of-band.

The basic interface functions required of the Info Service:

Information Services for multiple, heterogenous components have a number of important issues that have to be taken into account: memory leaks -- things that never get deleted, name space conflicts, and when there are conflicts the ability to specify precisely what one wants.

persistence - if a component puts data into the information service and then dies without removing it, what happens to the data? I propose that there be a property called persistence. If a message is stored with persistence then the message is kept in the Information service until it is explcitly deleted by a component. By default persistence is set to false and the information service deletes messages when it is informed that the component that stored them has exited.

What if a component tries to store a message with a KEY that is already in use? I propose that there be two properties associated with a KEY: overwriteable and multi-instance which are set by the StoreData request.

overwriteable - if a KEY is overwriteable then when a component goes to store a message with the same KEY, the new message overwrites the previous one. By default overwriteable is set to false.

multi-instance - if a KEY is multi-instance then when a component goes to store a message with the same KEY, the KEY is indexed and the message is stored associated with the next available KEY[index]. By default multi-instance is set to false.

direct index - if a KEY is multi-instance and overwriteable then components may want to store directly over a given instance of the KEY. So this is just the paragraph to say this should be possible with the Information Service.

Return codes for StoreData
Here are the ones that I can think of off the top of my head:
Key in Use
your favorite message here.

Options for GetData

Options we need to consider including in a GetData request is:
Exact Match - otherwise return nearest match?
First Available - KEY index above specified value.
Read and Delete - atomic operation.

Return codes for GetData
Here are the ones that I can think of off the top of my head:
Not Found
Multiple Instance

Here is where to place proposed API for this Information Service Abstraction. Al or others?

Addendum:   JP Navarro   Date: Tue Feb 5 21:55:53 2002 (GMT)
As was discussed in the 2/1/2002 meeting this Abstraction and Proposal might actually be a subset of the V2 API functionality. To help clarify that point I think the following would be useful: 1) A more detailed description of Info Service known fields/properties. This proposal suggests the following 6: key persistance property overwritable property multi-instance property instance index (when multiple instances are supported) data value Questions: Is key a simple or complex value (contains sub-fields)? Do we need another property to record which component owns a key? If not, how do components pick a private KEY valuespace? Do we need another property to record component versions? 2) What does the XML API for this proposal look like?

Addendum:   ERoman   Date: Thu Feb 7 00:23:29 2002 (GMT)
Here's a quicky. I think I know the answer, but I want to get a feel for the overall scope of the information service. Tell me why the answer is no... Suppose I just syncronized a job, and I'm ready to drop about 40,000 checkpoint files simultaneously. Would I use the information service? Why not? Why? How fast is the information service expected to be able to take incoming read/write requests? What type of sizes do you envision for each request? Just curious. - E