My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
ChecklistBank  
Checklist Bank is a Global Name Server Brokerage
Updated Jul 22, 2010 by wixner@gmail.com

This page is under construction and subject to significant revision.

Introduction

Checklist Bank extends the indexing capacity of the GBIF data portal to include taxonomic data sources. It serves to inventory checklist resources, enabling discovery and access to a wide range of taxonomic and nomenclatural databases, datasets, and initiatives that organise and serve these data.

Checklist Bank serves as a dynamic archive of "checklists," summarized lists of taxa or taxon names. Checklist Bank stores checklist data as it was provided by the data publisher. In addition, Checklist Bank attempts to collate different published checklist resources by tying the atomic elements of checklists, taxon names, to a common names dictionary.

  • A single point of discovery for multiple, stable taxonomic data providers
  • Provides points to names treated on other name services and details about that treatment in a standard format
  • Unified access to multiple taxonomic sources

The scope of this checklist index was effectively captured in a poster presented by Dean Pentcheff and Regina Wetzer at the EBiosphere conference in London, June 2009.

Benefits

Diverse taxonomic resources can work together

  • Access to taxonomic and nomenclatural data in a collective and consistent format will enable users with a wide range of needs to reference existing taxonomic and nomenclatural data records, eliminating the need for duplication and increasing the visibility and relavance of existing work. For examples please refer to the Thematic and Regional Checklist Building use cases.
  • Overlaps can be assessed and compared.
  • Gaps can be identified and remediated
    • Indexes of names occurring in the Global Names Index provide the full scope of nomenclatural as it exists within key domains of biological data, from museums specimens, to genomic repositories to the full historic and contemporary corpus of scientific publications.
    • The NUB merged taxonomic provides a global organisational framework for all of these taxon references that supports their assessment by a global community of experts.
    • Gaps in relative nomenclatural scope between global species datasets and the mobilised and provisionally sorted taxon references within the NUB and among indexed checklists can be evaluated
  • Duplicated and inconsistent data entry can be reduced
    • Select an indexed taxonomic reference and enter a name in a data entry form. Auto-complete the name and the entrie taxonomic hierarchy and retain a unique and persistent identifier to the source data.
  • Build a new thematic, global, or regional list of species that is tied to indexed taxon concepts.
  • Updates can be automatic and seamless

Requirements

The ChecklistBank Data Model has been developed to accommodate three major requirements.

  1. Capturing the core properties of taxonomic checklists with sufficient details to be useful to a wide range of potential users. These data elements are based on the the Darwin Core terminology. Additional properties are defined as extensions to the Darwin Core and include details regarding type specimens, vernacular names, species distribution and threat status, etc.
  2. Accurately representing each checklist resource as it was published. A single and universally accepted dictionary of correctly-spelled taxon names does not yet exist. As a result, different published resources may refer to the same name in slightly different ways. In order to enable effective search and retrieval of indexed resources, these differences need to be reconciled.
  3. Enabling the development of web services that promote the discovery, re-direction, and utilisation of datasets catalogued and indexed in the Checklist Bank database.

Main Entities

A description of the main entities in the Checklist Bank Model will follow the basic distinction between the core components of the data model.

Checklist

A checklist is the basic unit of Checklist Bank. Checklists:

- Provide definitions of taxa
- Provide nomenclatural details regarding taxon names
- Organized and summarized references to taxa that may be defined by combinations of taxonomic, regional, or thematic contexts.
- Link taxon names to vernacular (or common) names.

The Checklist Entity provides basic metadata about the checklist and and provides the link to entry for the checklist within the GBIF Registry (the GBRDS).

Name Usage

A checklist is composed of one or more "Name Usages" This term refers to an instance of a taxon name within a single Checklist resource. In some checklists, each row or entry may refer to an individual taxon. In other checklists, a taxon may be represented by multiple entries representing the accepted name and one or more name usages representing synonyms or publications asserted to refer to the same taxon. In addition to a taxon name, a name usage may include taxonomic and/or nomenclatural details regarding the use of the name according to the publisher. Each Checklist is composed of uniquely identified Name Usages.

Name String

A "name string" refers to the literal orthography of a taxon name as it is provided by a data publisher. The same name may occur within multiple Name Usages. A name string may include the taxon name, rank information, authorship, and other annotations.

Examples of namestrings:

  • Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam
  • Aotus J.E. Smith 1805
  • Manta birostris

Checklist Bank stores the exact orthography of a name as provided by the data publisher with the follow exceptions

removing trailing, leading, multiple whitespace commas in front of year in cited authorship are removed (Ex. "Aotus Illiger, 1811" -> "Aotus Illiger 1811")

When a multi-word name is published in atomised form where there is no literal combination a namestring value is generated based on:

dwc:genus (dwc:subgenus)? dwc:species dwc:infraspecificRank dwc:infraspecies dwc:authorship

Term

Many data elements in ChecklistBank benefit from the use of controlled or reconciled vocabularies of terms that may be represented in a particular property. Controlled vocabularies refer to strict lists and reconciled vocabularies occur when terms identified in sources are mapped to a controlled list. The Terms Entity in Checklist Bank manages all controlled lists and associated terms for any of the data elements in the schema. Examples of data elements tied to such controlled terms are taxonomic ranks, nomenclatural status, language and country names, etc. Vocabularies and associated terms are maintained and developed on the GBIF vocabulary server.

Lexical Group

Checklist Bank stores the exact orthography of a name as provided by the data publisher with the result that the same name may have slight (or significant) variation. This can present many problems in evaluating and comparing checklists, utilising checklists as organisational schema for biodiversity data, and in providing clear search and browse data interfaces. Checklist Bank addresses this issue by clustering lexically similar names into Lexical groups to enable "fuzzy matching" of names.

A Lexical group may contain both correct and incorrect spellings of a name as well as correctly-spelled variations in a name.

Examples of lexical grouping of Name Strings
Gerardia paupercula var. borealis (Pennell) Deam
Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam
Gerardia paupercula (A.Gray) Britton var. borealis (Pennell) Deam
Gerardia paupercula borealis
Gerardia paupercula borealis (Pennell) Deam
Rubus silvaticus
Rubus sylvaticus
Rubus silvaticum
Rubus silvaticus Weihe & Nees

Lexical groups are assembled by a combination of algorithmic and manually-mediated methods.

Nomenclatural Group

Checklist Bank organises lexical groups into larger sets of groups based on nomenclatural relationships. This allows, for example, expanding a search for information using one name, to include other names that are derived from the original type (homotypic names). This also allows cross-linking among different classifications that may reference the same name placed in different genera.

Nomenclatural groups are derived from data sources that publish nomenclatural information linking a name to an original name.


Sign in to add a comment
Powered by Google Project Hosting