My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
LifeDeskPublishing  
Workflow of LifeDesk publishing for multiple uses.
Updated Sep 22, 2009 by dprem...@gmail.com

Introduction

This page presents the components of a workflow that utilise the Global Names Architecture infrastructure to publish and access taxonomic datasets. In this workflow the following actors and objectives are discussed.

Data custodians of the EOL LifeDesks seek to publish taxonomic and species information for use by multiple potential end users. One requirement is the taxonomic data meet certain levels of quality and completeness set by the Integrated Taxonomic Information System. Once these criteria are met, the data may be accessed and used by some end user such as the Catalogue of Life.


Registration of the LifeDesk Resource(s)

  1. LifeDesk (LD) developers provide interfaces for data custodians to select LifeDesk resources for publication.
    1. LD developers map LD modules to the GNA data formats as well as other published extensions that EOL supports (including the EOL extension).
  2. One or more DarwinCore Archive files are produced as a result of this process.

The LD data is ready to be published. The data custodian may

  1. Proceed to publish the data immediately, or
  2. Perform a variety of data validation checks to ensure the published data is of suitable scope and quality for inclusion in other projects prior to publishing.

In this scenario, the custodian publishes the data by selecting a publish option in the LD site.

  • Step 1 The dataset is registered in the Registry and is assigned a GUID. The registration information includes the location of the published data (a URI).
  • Institutional information, configured in the LD, ties the dataset to registered institutional entities that have already been registered. Additional linkages or registration of institutions that are tied to the dataset can be configured through a registry web interface or via additional LD interfaces.
  • Step 2 Pre-configured meta-tags are linked to the dataset. For example, the EOL namespace stores a tag that identify the dataset as originating in a LifeDesk instance. Other interfaces would allow the custodian to add other tags as well, both private and public.
  • Step 3 Resource metadata is accessed from the published source and stored in the metadata catalogue. This extends the discover-ability of the published dataset.
    • The metadata catalogue accesses the registry and discovers a new published resource
    • The metadata catalogue obtains the registered data access point
    • The data is retrieved and the metadata profile is indexed in the metadata catalogue. Additional metadata may be derived through summary evaluation of the data records.


Registration and Use of an ITIS Validation Service

ITIS offers a validation service that evaluates whether a published taxonomic resource meets particular requirements in order to serve particular needs. These requirements might range from ensuring that a scientific name is properly formed and cited, ensuring the a particular set of data fields are completed and contain the proper type of data, and many other criteria. This service is configured to accept a data file configured in the GNA format as input.

Step 1 - ITIS registers the validation service within the Registry. Registration is likely through a web interface to the registry. Basic information include a title and brief description of the service as well as what class of content it acts upon. It might also specify if it requires specific registered extensions. For example, a service that evaluates distribution information might require the GNA Distribution extension.
Step 2 - The LD publication interface calls the Registry to access registered service information. Based on the data publication profile provided, the registry returns a service list that may relate to the published data types & extensions. These services appear in a list within the LifeDesk.
Step 3 - The ITIS validation service is selected. There are two methods by which the service can be run.
  1. Step 4 - The service can be called directly via a RESTful interface that accepts a URI to the published data file as input. The service performs the evaluation. Annotations for specific records as well as an overall dataset score are returned formatted to a GBIF annotation schema.
  2. An annotations broker (not yet implemented) can be utilised. In this case, the broker accepts the data file URI and additional contact information and returns a token. The broker calls the service. When the service is complete the annotations are transferred to the broker. The user uses the token to check the status of the service output and is optionally contacted when the service is complete. The annotations can be retrieved from the broker.
Step 5 - The annotations are returned to the LifeDesk system where developers can provide interfaces for the data custodian to review the results of the service. This may include unparsed responses, or suggested replacement data. The Interface could include methods that allow the custodian to confirm the suggestions and have updates made to the source data.

Lastly, if the service is performed and a particular threshold score is achieved, the service can send a message to the registry that tags the record with a private tag assigned to the ITIS namespace that identifies the dataset as having met the criteria set by the service. This can be used for subsequent filtering of datasets for uses that require those thresholds be met.


EOL Access of LifeDesk Data

EoL utilises the GNA infrastructure to access the data published by LifeDesk data custodians. The simplest method is with an EOL version of the Harvesting and Indexing Toolkit (HIT).

Step 1 - The HIT communicates with the registry based on a schedule configured by EOL or via messages accessible to the HIT from the registry (RSS feeds related to updated or new datasets).
Step 2 - The HIT obtains the dataset access point information (a URI) and fetches the file.
Step 3 - The HIT unpacks the data and a HIT "adapter" is configured to transform these data into the local EOL format and insert them into a local EOL database. The data appear on the EOL site.


Catalogue of Life Access of LifeDesk Data

The Catalogue of Life utilises the GNA infrastructure to access the data published by LifeDesk data custodians. The simplest method is with an COL version of the Harvesting and Indexing Toolkit (HIT). This version could be configured to conform to the "look and feel" of a CoL interface.

Step 1 - The HIT communicates with the registry based on a schedule configured by EOL or via messages accessible to the HIT from the registry (RSS feeds related to updated or new datasets). The COL may utilise the metatags tied to the published resources to identify resources of relevance to it. For example, one set of tags might indicate the data passes a particular threshold of completeness and quality based on the output of an evaluation service. Another tag might identify the resource as a Catalogue of Life Global Species Data (set). This tagging, in combination with the configuration of interfaces (skinning) allows the Catalogue of Life to essentially create a sub-network of the larger GNA focused on it's community of custodians and data resources.
Step 2 - The HIT obtains the dataset access point information (a URI) and fetches the file.
Step 3 - The HIT unpacks the data and a HIT "adapter" is configured to transform these data into the local COL format and insert them into a local COL data cache (Dynamic Checklist). The data appear on the Dynamic Checklist site.



Sign in to add a comment
Powered by Google Project Hosting