|
LifeDeskPublishing
Workflow of LifeDesk publishing for multiple uses.
IntroductionThis page presents the components of a workflow that utilise the Global Names Architecture infrastructure to publish and access taxonomic datasets. In this workflow the following actors and objectives are discussed. Data custodians of the EOL LifeDesks seek to publish taxonomic and species information for use by multiple potential end users. One requirement is the taxonomic data meet certain levels of quality and completeness set by the Integrated Taxonomic Information System. Once these criteria are met, the data may be accessed and used by some end user such as the Catalogue of Life. Registration of the LifeDesk Resource(s)
The LD data is ready to be published. The data custodian may
In this scenario, the custodian publishes the data by selecting a publish option in the LD site.
Registration and Use of an ITIS Validation Service
ITIS offers a validation service that evaluates whether a published taxonomic resource meets particular requirements in order to serve particular needs. These requirements might range from ensuring that a scientific name is properly formed and cited, ensuring the a particular set of data fields are completed and contain the proper type of data, and many other criteria. This service is configured to accept a data file configured in the GNA format as input. Step 1 - ITIS registers the validation service within the Registry. Registration is likely through a web interface to the registry. Basic information include a title and brief description of the service as well as what class of content it acts upon. It might also specify if it requires specific registered extensions. For example, a service that evaluates distribution information might require the GNA Distribution extension. Step 2 - The LD publication interface calls the Registry to access registered service information. Based on the data publication profile provided, the registry returns a service list that may relate to the published data types & extensions. These services appear in a list within the LifeDesk. Step 3 - The ITIS validation service is selected. There are two methods by which the service can be run.
Step 5 - The annotations are returned to the LifeDesk system where developers can provide interfaces for the data custodian to review the results of the service. This may include unparsed responses, or suggested replacement data. The Interface could include methods that allow the custodian to confirm the suggestions and have updates made to the source data. Lastly, if the service is performed and a particular threshold score is achieved, the service can send a message to the registry that tags the record with a private tag assigned to the ITIS namespace that identifies the dataset as having met the criteria set by the service. This can be used for subsequent filtering of datasets for uses that require those thresholds be met. EOL Access of LifeDesk DataEoL utilises the GNA infrastructure to access the data published by LifeDesk data custodians. The simplest method is with an EOL version of the Harvesting and Indexing Toolkit (HIT). Step 1 - The HIT communicates with the registry based on a schedule configured by EOL or via messages accessible to the HIT from the registry (RSS feeds related to updated or new datasets). Step 2 - The HIT obtains the dataset access point information (a URI) and fetches the file. Step 3 - The HIT unpacks the data and a HIT "adapter" is configured to transform these data into the local EOL format and insert them into a local EOL database. The data appear on the EOL site.
Catalogue of Life Access of LifeDesk Data
The Catalogue of Life utilises the GNA infrastructure to access the data published by LifeDesk data custodians. The simplest method is with an COL version of the Harvesting and Indexing Toolkit (HIT). This version could be configured to conform to the "look and feel" of a CoL interface. Step 1 - The HIT communicates with the registry based on a schedule configured by EOL or via messages accessible to the HIT from the registry (RSS feeds related to updated or new datasets). The COL may utilise the metatags tied to the published resources to identify resources of relevance to it. For example, one set of tags might indicate the data passes a particular threshold of completeness and quality based on the output of an evaluation service. Another tag might identify the resource as a Catalogue of Life Global Species Data (set). This tagging, in combination with the configuration of interfaces (skinning) allows the Catalogue of Life to essentially create a sub-network of the larger GNA focused on it's community of custodians and data resources. Step 2 - The HIT obtains the dataset access point information (a URI) and fetches the file. Step 3 - The HIT unpacks the data and a HIT "adapter" is configured to transform these data into the local COL format and insert them into a local COL data cache (Dynamic Checklist). The data appear on the Dynamic Checklist site. | |