|
ChecklistBankIdentifier
Identifiers in Checklist Bank
IntroductionThis page tries to give some insight in how identifiers are handled within ChecklistBank. If you haven't done so, please first read the introduction and look at the ChecklistBankSchema to learn about the main entities used. External IdentifiersChecklist bank keeps track of source ids and can hold any number of additional alternative identifiers for a record. This is useful to attach various ids for certain technologies to the same name usage, for example LSIDs, RDF URIs, LinkedData or RDFa URLs, webpages, DOIs, UUIDs, etc. Additional identifiers can be submitted as part of the darwin core archive via the alternative identifier extension. CLB IdentifiersChecklist Bank does manage its own identifiers as integers. But not all of them are exposed and assigned randomly. We try our best to keep them stable, especially when it comes to the NUB. Name IDName ids are assigned for each distinct name string and never change as name strings are never deleted, only deprecated. Lexical Group IDLexical Groups have to be regenerated often as algorithms change and manual interactions happen. It is therefore quite challenging to keep their ids stable, what is it that eventually defines a group? We base lexical group ids on name ids as much as possible, making use of the stable nature of name ids. In order to do so we distinguish between 2 kind of lexical groups. Non canonical groupsA lexical group that contains at least one non canonical name, i.e. a qualified name with an authorship, is considered well defined. It is highly unlikely that this name is part of another lexical group. It is therefore possible to use assign the lexical group the same id as the id of the non canonical name. In case there are multiple non canonical names, the earliest created one is used so the lexical group id doesnt change if new names are added. Canonical GroupsPure canonical groups are made up of names that are likely to be part of other groups too. They are therefore assigned a sequentially generated, volatile identifier that changes anytime lexical groups are reprocessed. Sequentially generated lexical ids start with 100 million and are therefore easily recognizable. Usage IDIn general usage ids are volatile and will change whenever usages are reimported. This wont happen for static datasets, but is true for most. A special case is the synthesized NUB. NUB Usage IDAs we use the NUB as our default management classification, we would like to keep NUB usage ids as stable as possible. As NUB usages correlate 1:1 to lexical groups, i.e. there is exactly one nub usage for each lexical group, we can simply reuse the stable lexical group ids and generate sequential, volatile identifiers for the non stable lexical ids > 100 million. | |