My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
ECAT_EDIT_Developer  
WorkPlan for ECAT EDIT Developer
Updated Sep 3, 2009 by dprem...@gmail.com

This page has been published but is subject to revision.

These terms of reference define a 11-month programme of work for the period from 1 April 2009 until 31 March 2010. During that period the contractor will focus on 3 main areas of work

  1. Refine and extend the GBIF vocabulary service.
  2. Integrate the taxonomic scratchpads into the GBIF data publication system.
  3. Work with GBIF and GBIF contractors to implement and evaluate the GBIF network data validation services

These 3 work areas have been documented in a schematic (Annex D) where they have been divided into 7 different phases tied to a calendar of events that will be refined during the first month of the contract. The source file of this schematic will be located in a shared and version-controlled repository shared between the GBIFS Informatics and the contractor. It will contain updated schedules, identified outcomes and dependencies.

Phases 1 and 2 - (1) Refine and (2) extend the GBIF vocabulary service

The vocabulary service developed by the contractor for GBIF during the first phase of the contract is ready for more general deployment to the GBIF community to serve as a multi-lingual vocabulary system. This contract will refine and extend the functionality of the service. Specific areas of work include:

  1. Complete documentation of the available services that explain the scope, intent and use of the vocabularies to non-technical users.
  2. Continue to modify the existing vocabulary service specifications according to requirements specified by the GBIF Secretariat Informatics programme.
  3. Extend the scope of the server to enable the drafting, editing and publishing of Darwin Core Extensions according to the extensions schema provided by the GBIF Secretariat
  4. Integrate the creation and publication of extensions and vocabularies to enable efficient workflows for GBIF data publishers to define their published output and provide multi-lingual support to all elements.
  5. Work with the GBIF Informatics team to ensure integration of the service with the GBIF Registry.

This phase of development is focused on refining and extending the GBIF multilingual vocabulary server to better serve the needs of the GBIF network by extending the service to enable the description and publication of GBIF Extensions to the Darwin Core. GBIF supports the creation of extensions to the DarwinCore terms (currently in public review), which, when combined with the text guidelines for publishing simple delimited text files, greatly enhances the capacity of the GBIF publishing framework while also simplifying the overall publishing process. Extensions used within the GBIF network are registered in the GBIF registry and their creation is governed by best practice rules and a structured schema to ensure they are well-defined and utilize recognised terms and vocabularies whenever possible. Technical bottlenecks will be addressed by adding the means to create and publish extensions within the GBIF vocabulary server. It will ensure that the discovery of existing, and the creation of new, extensions can be undertaken in a multi-national and public forum. Coupling the extensions with the existing vocabulary server is a natural fit as best practices in the creation of extension terms are to utilize controlled vocabularies when possible. The integration of the two will support the simplification of the data publishing workflow for creating and using extensions. A community seeking to create an extension can utilize the same system to initialize, and link to, supporting vocabularies. We will evaluate the benefits of additional capacity within the server that include the means to annotate GBIF-registered services and other components of the GBIF network that might have a multi-lingual component.

Phase 3 – Enable publication of selected data resource types to the GBIF network.

The second major phase of the contract involves the application of the GBIF registry and components of the GBIF network in the publication, evaluation and annotation of datasets with a focus on taxonomic checklists. The goal of this phase is to develop the capacity of curators of the EDIT taxonomic scratchpads and the EoL LifeDesks to publish data via the GBIF network as well as to utilize registered data validation and value-added services and incorporate annotations derived from those services into the Scratchpad/LifeDesk workflow. As part of the contract we will identify at least one exemplar dataset to use as a test case for each content or resource type that we seek to mobilize for publishing (Checklist, image store, specimen store, etc.) As part of the contract, GBIF will consider updating the existing World Catalogue of Common Names scratchpad site to serve as a repository for common names data in support of the GBIF network. This would entail updating the content model to be congruent with the GBIF vernacular names extension and populating the site with some datasets currently held in proxy at GBIFS. If the site is added to the exemplars it would be one of the content types to be exported to DwC Archive. Phase 3, Registration – In this phase of the work the contractor will create a data export module that communicates with the GBIF registry. The contractor will coordinate with the ScratchPad development team, GBIF Informatics, and scratchpad curators to identify data resources for publishing to the GBIF network. The goal of this phase is to enable a GBIF resource to be registered as a metadata catalogue entry using a combination of existing metadata for the resource already in the scratchpad and the addition of additional recommended data elements. This requires write access to the GBIF Registry through the API. API specifications will be provided to the contractor by GBIFS Informatics.

Phases 4-5, Export data to a DarwinCore Archive file

In this phase the contractor will coordinate with the ScratchPad development team, GBIF Informatics, and scratchpad curators to identify datasets for full data publication to the GBIF network. Selected datasets will be identified, ideally representing multiple classes of checklist, occurrence, or species data. The contractor will refine a scratchpad export module that will interact with the GBIF registry. The module will include read access to extension and vocabulary definitions to enable a data publisher to select the resource definitions that match the data to be published. The focus of this phase will be to enable a simple direct transformation of scratchpad content with the assumption that only minor or no transformation steps are required (i.e., a straight 1:1 mapping of data elements to corresponding DwC and defined extensions is enabled). The export module will retain the mapping profile once established to provide similar capacity as an export option to any scratchpad maintainer. It will include mechanisms to allow for manual or activated publishing events. The output format will be a zipped DarwinCore Archive file and the location will be automatically added to the metadata record for the resource in the GBIF registry. In Phase 4 the data focus will be on a taxonomic checklist. Options include an existing taxonomic checklist or the reconstitution of the ITIS Global Bee Checklist with the Global Bee Checklist Scratchpad. If the latter, additional scheduling will be incorporated to enable the contractor to update the site for curation by at least one external curator identified by GBIF in association with ITIS.

Phase 5

In Phase 5 the data focus is on specimen or other species occurrence data (images, observations, etc.). These data may be exported as extensions to the core taxon file or with a core record based on a data occurrence. Specific export format will be determined by the consultation with the Scratchpad group and GBIFS.

Phase 6 - Work with external contractor to access registered service.

In this phase the contractor will develop a module for interacting with a registered service to provide validation or value-added services to the scratchpad content. The specific service will be identified by the ScratchPad group and the GBIFS.

Examples include:

  • a Georeferencing service for providing latitude, longitude and precision estimates for locality information.
  • a syntax checker for scientific names, author names, and other components of taxon names found in specimen or image data.
  • others to be determined.

The module will provide a scratchpad interface that accesses the GBIF Registry for a list of available services. It will provide a UI for selecting a service and for instantiating the service via the service API. The specific workflow between the scratchpads and registered services will be defined by the GBIFS Informatics and the contractor. The service module will be the component that provides the output request to the service, or possibly a 3rd party service broker (a web app that directs a service to a data file and holds the annotations for pickup). The module will also access the annotations output from the service (again either direct from the service or via a broker application).

GBIFS Informatics will provide specifications for the GBIF services annotation response schema.

Phase 7 Process the service output

In this phase the contractor will develop an interface for processing annotations returned from the selected service. A general processing mechanism will be developed that will enable a data curator to review the annotations returned by the service. The interface will allow the selection of some or all of the annotations to be accepted or rejected. Selected annotations would be used to insert or update affected data records within the scratchpad.

Diagram of workplan


Sign in to add a comment
Powered by Google Project Hosting