Node Build and Configuration Notebook - page 15 of 55

First PagePrevious PageNext PageLast PageTable of ContentsSearch

Date and Author(s)

Information Service Design V1

Below is a design derived from Information Service discussions in our January 4,        
11 meetings.  The agenda for today's meeting is to discuss the "**" decisions to        
make sure everyone agrees, and to discuss the "??" issues that require        
additional attention (plus any other Information Service issues people would        
like to address).        
We would like to nail down this design so that we can start coding a prototype        
that we can demo at our next face to face meeting.  This is a component that        
could be of interest to many of the SSS teams, and having an early prototype        
that other components can use would be a great win.        
I'm also posting this to the notebook.        
** Information Service design **        
Key:  ** -> decisions        
      ?? -> issues or questions that require additional attention        
Information service abstract data model        
We discussed two abstract models for data in the information service:        
1) all the data is in the form "keyword_string=value"        
2) all the data is in the form of a spreadsheet which is conceptually        
   similar to a relational database table        
** We decided the information service should be based on model 2).        
Some of the consideration were:        
To represent complex data model 1) would require the keyword_string        
to include multiple keywords separate by delimiters. Some of these        
keywords would be data values while others might be value labels.        
For example, in:        
   <hostname1>.networkadaptor1.ip_address =        
Fields are:        
   <hostname1> is a data value        
   networkadaptor1 is value label or a data value?        
   ip_address are value label   is a data value        
This approach doesn't appear to provide a way to distinguish values        
from labels and of enforcing any kind of data schema for values or        
labels. Other issues with model 1) include: how to encode data        
selection criteria, and having to parse keyword_string values.        
Model 2, due to it's similarity to the relational table model, lends        
itself well to data schema definition and clearly distinguishes        
between labels (column names in the schema) and values (actual data        
in the rows).        
Information service data schemas        
** We decided the information service must have data schemas.        
A data schema has a name, similar to a table name, and a set of named        
and ordered columns.  One or more of these columns should be declared to        
uniquely identify a row (a primary key in relational databases terms).        
** Each data schema must have a version and the information service must        
be capable of storing data for multiple versions of a data schema.        
This feature makes it possible for an application to change it's data        
schema in some incompatible way and to publish both the new and old         
versions so applications using either can coexist.        
?? Does this mean that the data schema version needs to be part of the        
data schema name (table name).        
** Before an application can store data in the information service it        
must define the data schema to the information service. Defining the        
data schema will be done through a documented information service interface.        
** We will support the ability to: dynamically add columns to a data        
schema, and to delete a data schema. These operation should use a        
documented information service interface.        
** A particular information service data schema must have a core set        
of required common fields, and may contain additional optional fields.        
The required common fields would be the ones that are defined to be the        
lowest common denominator for a SciDAC SSS compatible component.        
Additional fields could exist if a component implements expanded        
features with additional data.        
Manipulating data in the information service        
This section discusses the Information Service API, which will translate        
into API schemas (to be distinguished from _data_ schemas).        
** We will borrow the relational terms:        
    insert to add data to a schema/table,        
    update to modify data in a schema/table,        
    delete to remove data from a schema/table,        
    select to query the data in a schema/table.        
** We will also support the term query as a more common synonym        
for select. These terms will probably translate into API function        
?? How about write and read.  Write may be very useful.        
** The API schemas used to insert, update, delete, and query will be        
semantically similar to the corresponding ANSI standard SQL statements.        
** We will not support advanced features common in SQL through the        
standard API: internal functions that perform operations of data        
values, joining multiple tables, etc.        
** A particular information service implementation that chooses to store        
data in a relational database may support additional API functions that        
perform arbitrary SQL queries which are passed directly to the backend        
relational database.         
** The terms records and rows are synonymous        
** We will not provide transactional services or provide transaction        
consistent views of data. To minimize the likelihood of encountering        
inconsistent data in the repository, we will establish some guidelines        
that applications using the information service may follow.        
?? We might be able to get around this problem by providing atomic        
multi-function calling capability.        
?? We didn't decide if we would enforce any data typing, or required        
value constraints.        
Information service data storage        
** We do not want to require that information service implementations        
use a backend relational database. For this reasons we're sticking to        
a very simple data schema design and to simple data manipulation        
** If a particular information service uses a relational database back        
end, it may expose additional API functions that supports advanced SQL        
queries. It must also support the standard API functions required to        
be a SciDAC SSS compliant component.        
We discussed the concepts of internal vs external schemas, we believe        
there is value in supporting them so that dependencies are maintained        
between related schemas/tables when primary key values change.        
** We will write-up a guideline on using internal vs external schemas        
and encourage that applications using the information service take        
advantage of this approach.        
?? We need to support internal vs external schema definitions.        
Information service API schema        
Schema for the call made to the information service:        
create schema <schema_name>, column list, primary key list        
expand schema <schema_name>, columns to add        
reduce schema <schema_name>, columns to remove        
delete schema <schema_name>        
query schema_names        
query schema_name_columns <schema_name>        
insert into <schema_name>, field list, value list        
update      <schema_name>, field list, value list, selection field list,        
                                                   selection value list        
                                                   (implied and)        
delete      <schema_name>, selection field list, selection value list        
                                                   (implied and)        
query       <schema_name>, selection field list        
To-do: convert the above to XML API schemas.