|Date and Author(s)|
Below is a design derived from Information Service discussions in our January 4, 11 meetings. The agenda for today's meeting is to discuss the "**" decisions to make sure everyone agrees, and to discuss the "??" issues that require additional attention (plus any other Information Service issues people would like to address). We would like to nail down this design so that we can start coding a prototype that we can demo at our next face to face meeting. This is a component that could be of interest to many of the SSS teams, and having an early prototype that other components can use would be a great win. I'm also posting this to the notebook. JP ******************************** ** Information Service design ** ******************************** Key: ** -> decisions ?? -> issues or questions that require additional attention Information service abstract data model ------------------------------------------- We discussed two abstract models for data in the information service: 1) all the data is in the form "keyword_string=value" 2) all the data is in the form of a spreadsheet which is conceptually similar to a relational database table ** We decided the information service should be based on model 2). Some of the consideration were: To represent complex data model 1) would require the keyword_string to include multiple keywords separate by delimiters. Some of these keywords would be data values while others might be value labels. For example, in: <hostname1>.networkadaptor1.ip_address = 126.96.36.199 Fields are: <hostname1> is a data value networkadaptor1 is value label or a data value? ip_address are value label 188.8.131.52 is a data value This approach doesn't appear to provide a way to distinguish values from labels and of enforcing any kind of data schema for values or labels. Other issues with model 1) include: how to encode data selection criteria, and having to parse keyword_string values. Model 2, due to it's similarity to the relational table model, lends itself well to data schema definition and clearly distinguishes between labels (column names in the schema) and values (actual data in the rows). Information service data schemas -------------------------------- ** We decided the information service must have data schemas. A data schema has a name, similar to a table name, and a set of named and ordered columns. One or more of these columns should be declared to uniquely identify a row (a primary key in relational databases terms). ** Each data schema must have a version and the information service must be capable of storing data for multiple versions of a data schema. This feature makes it possible for an application to change it's data schema in some incompatible way and to publish both the new and old versions so applications using either can coexist. ?? Does this mean that the data schema version needs to be part of the data schema name (table name). ** Before an application can store data in the information service it must define the data schema to the information service. Defining the data schema will be done through a documented information service interface. ** We will support the ability to: dynamically add columns to a data schema, and to delete a data schema. These operation should use a documented information service interface. ** A particular information service data schema must have a core set of required common fields, and may contain additional optional fields. The required common fields would be the ones that are defined to be the lowest common denominator for a SciDAC SSS compatible component. Additional fields could exist if a component implements expanded features with additional data. Manipulating data in the information service -------------------------------------------- This section discusses the Information Service API, which will translate into API schemas (to be distinguished from _data_ schemas). ** We will borrow the relational terms: insert to add data to a schema/table, update to modify data in a schema/table, delete to remove data from a schema/table, select to query the data in a schema/table. ** We will also support the term query as a more common synonym for select. These terms will probably translate into API function names. ?? How about write and read. Write may be very useful. ** The API schemas used to insert, update, delete, and query will be semantically similar to the corresponding ANSI standard SQL statements. ** We will not support advanced features common in SQL through the standard API: internal functions that perform operations of data values, joining multiple tables, etc. ** A particular information service implementation that chooses to store data in a relational database may support additional API functions that perform arbitrary SQL queries which are passed directly to the backend relational database. ** The terms records and rows are synonymous ** We will not provide transactional services or provide transaction consistent views of data. To minimize the likelihood of encountering inconsistent data in the repository, we will establish some guidelines that applications using the information service may follow. ?? We might be able to get around this problem by providing atomic multi-function calling capability. ?? We didn't decide if we would enforce any data typing, or required value constraints. Information service data storage -------------------------------- ** We do not want to require that information service implementations use a backend relational database. For this reasons we're sticking to a very simple data schema design and to simple data manipulation operations. ** If a particular information service uses a relational database back end, it may expose additional API functions that supports advanced SQL queries. It must also support the standard API functions required to be a SciDAC SSS compliant component. We discussed the concepts of internal vs external schemas, we believe there is value in supporting them so that dependencies are maintained between related schemas/tables when primary key values change. ** We will write-up a guideline on using internal vs external schemas and encourage that applications using the information service take advantage of this approach. ?? We need to support internal vs external schema definitions. Information service API schema ------------------------------ Schema for the call made to the information service: create schema <schema_name>, column list, primary key list expand schema <schema_name>, columns to add reduce schema <schema_name>, columns to remove delete schema <schema_name> query schema_names query schema_name_columns <schema_name> insert into <schema_name>, field list, value list update <schema_name>, field list, value list, selection field list, selection value list (implied and) delete <schema_name>, selection field list, selection value list (implied and) query <schema_name>, selection field list To-do: convert the above to XML API schemas.