My favorites | Sign in
Logo
                
Search
for
Updated Oct 05, 2009 by dpremsen
TaxamatchInfo  
Information on Lexical or Orthographic Grouping of Taxon Names

This page has been published but is subject to revision.

Introduction

Lexical or Orthographic grouping refers to the clustering of taxon names based on principles of orthographic variation. The same taxon name may be represented by different arrays of text strings for a variety of reasons, some of which are listed below. Addressing lexical variation is important in order to:

The same name may be represented in different ways for a number of reasons, not the least of which is the difficulty many people have in spelling Latinized names. Below are some examples and explanations for variation in taxon names.

Pomatomus saltatrix (Linnaeus 1768) P. saltatrix Abbreviation of the taxon name
Pomatomus saltatrix POMATOMUS SALTATRIX Variations in representation of case
Pomatomus saltatrix Pomatomix saltatrix Misspelling of taxon name
Pomatomus saltatrix Pomatomus saltator Variation in Latin gender suffix of species epithet
Pomatomus saltatrix (Linnaeus 1768) Pomatomus saltatrix (Linn. 1768) Variation in orthography of authorship
Pomatomus saltatrix (Linnaeus 1768) Pomatomus saltatrix Canonical representation of name
Agaricus silvaticus Agaricus sylvaticus Multiple 'correct' spellings
Tetrao afer Müller Tetrao afer Mueller Transposition of diacritical marks
Tetrao afer Müller Tetrao afer Mueller Transposition of diacritical marks
Tetrao afer Müller Tetrao afer Müller Text encoding errors

Taxa Match

TAXAMATCH is a “fuzzy” or near match algorithm developed by Tony Rees at CSIRO Marine and Atmospheric Research, Australia, over the period 2007-9 (with precursors from 2001 onwards), with the purpose of providing optimal fuzzy matching for genus and species scientific names in real world situations. It will match both phonetically (e.g. "Caelorhynchus" to "Coelorinchus") and non-phonetically (e.g. "Hombo sapient" to "Homo sapiens"), provided of course that the desired target name is actually present in the reference database at which the particular instance of TAXAMATCH is directed.

GNI Taxa Match implementation

The most up-to-date TaxaMatch implementation is a web service being hosted on the GNI portal. (Links pending)

Tony Rees Original Taxa Match implementation

General information on TAXAMATCH is available on the CMAR website:

http://www.cmar.csiro.au/datacentre/taxamatch.htm

with Tony's reference inplementation incorporated into the human accessible search interface to IRMNG, the Interim Register of Marine and Nonmarine Genera, at:

http://www.cmar.csiro.au/datacentre/irmng/

Silver Biology Taxa Match PHP

Tony's implementation is written in Oracle PL/SQL programming language, and source code is available on request (Tony.Rees@csiro.au). Michael Giddens of silverbiology.com has taken on the responsibility for a PHP port of TAXAMATCH in conjunction with use case/s advised by GBIF among others. Mike's development activities are accessible via the following URLs:

http://code.google.com/p/taxamatch-webservice
http://taxamatch.silverbiology.com
http://taxamatch.silverbiology.com/svn
http://taxamatch.silverbiology.com/svn/gui/

Heimo Rainer PHP Taxa Match

Heimo has developed a Taxamatch implementation that uses a modified levenshtein distance algorithm and provides a C-based User Defined Function (UDF) that allows taxamatch to be called within mySQL. Functionality of the current version includes parsing of names for uninomials, (family and genus names) binomials and trinomials. It also includes a check for subgenus names from the list of genera in their reference list. The current reference list is mostly phanerogamic plants, to be complemented by the index fungorum and CoL names in october.

http://herbarium.univie.ac.at/taxamatch/taxamatchMdld.php

A bulk upload routine is available at http://herbarium.univie.ac.at/taxamatch/bulkupload/bulkupload.php

type in a user name and select a file for upload results downloadable as a CSV file

Using Taxamatch

Up to and including the Nomina IV meeting at Woods Hole in May 2009, there is interest in uptake of TAXAMATCH for uses including web user queries, source data deduplication, query expansion, and name finding algorithms from (among others) EOL, OBIS, ALA, GBIF, GNI, LifeDesks, FishBase and more. A graphical summary of TAXAMATCH operation is included in the following poster:

http://www.cmar.csiro.au/datacentre/ext_docs/Rees_taxamatch_poster_May_09.pdf .

Implementation

Lexical grouping is an important process in the building of the GBIF merged taxonomic backbone.


Sign in to add a comment
Hosted by Google Code