This page is under construction and subject to significant revision.
Introduction
This addresses the challenge of creating a hierarchy that can references any known family names. Initially the goal is to limit this to family names associated with occurrence records that have been collected by GBIF. According to Tim, there are about 25,000 of these. The process should be expandible to any family.
This is one layer of a three layered approach to taxonomy management. The lowest layer deals with the taxonomic leaves, where a big problem is the need for reconciliation. The intermediate layer interconnects genera to families.
The intent is to create a 'fair' structure in 2 months.
The assembly of the system will require manual manipulation, will create an easily editable hierarchy in an open environment, and be freely available. Assuming that the manual editing is done in a LifeDesk, then it will be exportable in parent/child or TCS formats
Some numbers would help
Can we have totals of the number of families, and run numerical pairwise comparisons of:
- All families known to GBIF
- Families associated with occurrence data
- IRMNG families
- CoL 2007 families
- CoL 2009 families
- WoRMS families
- How many families cannot be assigned to any hierarchy. This number is important to paddy, because each member will have to be researched and assigned manually. See 'What will be needed' below.
Pre manual steps
- The classification that has the largest proportion of families known to GBIF will be used as the initial reference structure
- Families to be subject to fuzzy matching, create reconciliation groups, and work only with one representative family name from each reconciliation group
- reduced Marcus will algorithmically draw in families from other classification, matching on the next highest shared parent
- the product to be made available in an editing environment
- Paddy to explore taxonomic anomalies throughout the totality of the tree structure, and resolve them according to some principles to be determined
Editing location
A LifeDesk (lifedesks.org). The address will be added here later.
What will be needed by Paddy / taxonomic team
- The merged structure in a format that can be ingested by a LifeDesk.
- Family names NOT represented in any hierarchy to be made available as a separate flat list - to be ingestible by the LD.
Deliverables
- A hierarchy that is biologically defensible in which all families are represented
- unresolved disputes to be codified in some way
- the product to be openly accessible web site with an editable classification
- classification exportable in Parent-Child or TCS formats
- some team of people willing to work on this structure
- model for scaling this approach, ensuring it is a process, and can be sustainable
Post manual steps
- Marcus or others to add back the remaining members of the reconciliation groups, and map genera and therefore species names into this structure.
LifeDesk functionality issues
- LifeDesks have a great taxonomic editor, and is the environment of choice for editing. Some desirable areas of functionality are absent from LifeDesks. It would be great if these could added. I do not think EOL can reset its schedule to make the changes. Can Dave ask Kehan to work with our guys to add the required functionality before we start working on the names. Some features that we may want are:
- to carry one or more numerical metrics with names (such as to indicate number of occurrence records)
- to carry flags (annotations) with names (such as flags that tell us this is a fossil taxon)
8 possibly to carry attribution information
- for the data model to be extended to include numbers or flags
- change the delivery so that the metrics or annotations change the appearance of the relevant clades (i.e. impose color coding, add some kind of icons, etc.)
- possibly to search, order, or filter by these comments.