|
Nom5PreviousComments
Comments captured during original nub building
1. When importing the taxonomy for a well-managed taxonomic database. In this case it is expected that each taxon will consistently appear with the same classification in all records in which it appears. If the same taxon name appears in multiple slightly different records, it is to be expected that the instances are cases of homonymy and should be preserved as separate entities. 2. When importing the taxonomy for a database not intended to be taxonomically authoritative (e.g. a collection database). In this case the assumption is that the data may include varying classifications for the same taxon and that the system should be more cautious about presenting them as different. 3. When merging the taxonomy from a well-managed taxonomic database into the portal nub taxonomy. In this case the classifications may not match those from other (even authoritative) sources but the portal should be able to maintain any distinctions made in the taxonomic database itself. 4. When merging the taxonomy from a database not intended to be taxonomically into the portal nub taxonomy. In this case the classifications will often not match those from other sources and the portal should not assume that they represent different taxa unless there are very strong reasons to do so. The key requirements to handle these cases are as follows: A. In situations 1 and 2, the requirement is to import the resource taxonomy with as much fidelity as possible. The method should therefore preserve every apparent classification and only merge those which provide different compatible subsets of the same classification. If the resource taxonomy is not at all well-managed, this may mean that there are many different representations for the same taxon in different locations in the same taxonomy. This will be handled when the taxonomy is tied to the portal nub taxonomy. More generally, when importing taxonomies, this method should only return completely compatible matches from the same resource's taxonomy. Little harm will befall the portal from over-distinguishing taxa at this point, since the important stage will be in situation 3, when the dataset is merged into the portal taxonomy. REQUIREMENT - INCLUDE A MODE THAT FINDS ONLY FULLY COMPATIBLE CLASSIFICATIONS NOTE: special rules may be required for some databases that require unique handling. The hardest cases will relate to records representing different concepts for the same taxon name with the same classification. Special processing outside this method will be required to handle such cases. B. In situation 4 on the other hand, the requirement is to minimise the number of cases in which the same taxon is split into multiple locations in the taxonomy. The method should therefore be able to determine whether a suitable join point already exists and to use it, even if a significant proportion of the classification is different. REQUIREMENT - INCLUDE A MODE THAT SELECTS THE MOST SUITABLE CLASSIFICATION IF ONE EXISTS C. In situation 3, the requirement is again to reuse an existing taxon if one is suitable, but the method should not conflate taxa that have been explicitly separated by the resource. REQUIREMENT - INCLUDE A MODE THAT SELECTS THE MOST SUITABLE CLASSIFICATION BUT RESPECTS DIFFERENT TAXA SHARING THE SAME NAME WITHIN THE SOURCE CLASSIFICATIOND. In situations 3 and 4, the portal should detect cases in which it is not possible safely to merge a candidate taxon with any of the existing taxa under the given name, and should create a special taxon concept to store the ambiguous information. It should avoid multiplying these disambiguation concepts for the same name, since otherwise the taxonomy will become impossibly complex. REQUIREMENT - INCLUDE A MODE THAT CAN CREATE DISAMBIGUATION TAXA AS NEEDED These requirements are handled as follows: i. If the request is to find a concept in taxonomies other than the portal taxonomy, full compatibility is required (i.e. rejecting any classification with a different name in the same position. Otherwise no match is returned. This addresses requirement A ii. If the request is to find a concept in the portal taxonomy (resource 1), this method will find the most suitable classification using a reasonably lenient matching algorithm (threshold set to 33). This addresses requirement B. iii. If the matching algorithm in ii. cannot distinguish between multiple concepts, and the request is for the portal taxonomy, a disambiguation taxon is returned. This addresses requirement D. iv. In cases in which the resource taxonomy may include homonyms, it is the responsibility of code using this method to determine which resource concepts to associate with existing concepts, and which resource concepts require new concepts. This cannot be handled at this level. This addresses requirement C. NOTE: Import of authoritative taxonomies requires additional logic around this method to ensure correct import and merging of homonyms. | |