My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
ChecklistBankIdentifier  
Identifiers in Checklist Bank
Updated Oct 30, 2009 by wixner@gmail.com

Introduction

This page tries to give some insight in how identifiers are handled within ChecklistBank. If you haven't done so, please first read the introduction and look at the ChecklistBankSchema to learn about the main entities used.

External Identifiers

Checklist bank keeps track of source ids and can hold any number of additional alternative identifiers for a record. This is useful to attach various ids for certain technologies to the same name usage, for example LSIDs, RDF URIs, LinkedData or RDFa URLs, webpages, DOIs, UUIDs, etc. Additional identifiers can be submitted as part of the darwin core archive via the alternative identifier extension.

CLB Identifiers

Checklist Bank does manage its own identifiers as integers. But not all of them are exposed and assigned randomly. We try our best to keep them stable, especially when it comes to the NUB.

Name ID

Name ids are assigned for each distinct name string and never change as name strings are never deleted, only deprecated.

Lexical Group ID

Lexical Groups have to be regenerated often as algorithms change and manual interactions happen. It is therefore quite challenging to keep their ids stable, what is it that eventually defines a group?

We base lexical group ids on name ids as much as possible, making use of the stable nature of name ids. In order to do so we distinguish between 2 kind of lexical groups.

Non canonical groups

A lexical group that contains at least one non canonical name, i.e. a qualified name with an authorship, is considered well defined. It is highly unlikely that this name is part of another lexical group. It is therefore possible to use assign the lexical group the same id as the id of the non canonical name. In case there are multiple non canonical names, the earliest created one is used so the lexical group id doesnt change if new names are added.

Canonical Groups

Pure canonical groups are made up of names that are likely to be part of other groups too. They are therefore assigned a sequentially generated, volatile identifier that changes anytime lexical groups are reprocessed. Sequentially generated lexical ids start with 100 million and are therefore easily recognizable.

Usage ID

In general usage ids are volatile and will change whenever usages are reimported. This wont happen for static datasets, but is true for most. A special case is the synthesized NUB.

NUB Usage ID

As we use the NUB as our default management classification, we would like to keep NUB usage ids as stable as possible. As NUB usages correlate 1:1 to lexical groups, i.e. there is exactly one nub usage for each lexical group, we can simply reuse the stable lexical group ids and generate sequential, volatile identifiers for the non stable lexical ids > 100 million.


Sign in to add a comment
Powered by Google Project Hosting