My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
NameParsing  
Rules, terminology and an API for parsing scientific names
Updated Dec 21, 2011 by wixner@gmail.com

This page has been published but is subject to revision.

Introduction

Name Parsing refers to the splitting of taxon names into identified atomic parts. Name parsing is an important component of taxonomic name processing and subsequent reconciliation of names to authoritative nomenclatural and taxonomic sources.

Test our Parsers

Try one or more of the reference implementations below. Test the parsers and try to break them. Do you have names that may have unusual syntax or structure. We wish to ensure we can effectively parse these in order to facilitate better integration of biodiversity data.

Post any prospective or proven "parser-breakers" to the Issues section of this site or mail to dremsen@gbif.org or even post to the Comments section of this page with contact information.

Test your own parser

A TEST LIST of names with expected JSON results or a list of testnames with the expected canonical form that provide a range of formats can be used to test any parser implementation.

Reference Implementations

GNI Implementation

See/Use a reference implementation of the name parsing services described below at globalnames.org. Input a list of names and view/use various output formats.

GBIF Implementation

A Java-based parser developed by GBIF is available accepts a list or file of names and provides a range of parsing options that includes returning a "standardised" code-compliant format.

GBIF Spain

GBIF Spain offers a desktop name parser that works on the Windows operating system. Contact the GBIF Spain node of GBIF for more information.

Grammar

Name grammar as Backus Naur Form (BNF), so it can be used with many parser generators: http://en.wikipedia.org/wiki/Backus–Naur_form

Nomenclatural Codes

Grammar parsing non code compliant names

see http://code.google.com/p/taxon-name-processing/source/browse/trunk/parsing/grammars/not_compliant.bnf

Name Parser API

The name parser API defines a simple RESTful service to parse a single name, a list of names or a posted file containing a name on each line. Two types of responses are defined, XML (text/xml) or JSON (application/json). The desired content type should be specified in the http header with the service being free to decide on a default response type.

Requests Parameters

Example:
http://www.globalnames.org/parsers?names=Mentha%20×smithiana%20R.%20A.%20Graham%20(1949)

GET Parameters
==============
names: a pipe "|" seperated list of names to be parsed. A single name without pipe symbol of course is also acceptable

POST Parameters
===============
names: a text file with a name on each line posted as multipart/form-data as parameter "names"

Response XML

The name parser xml schema defines a parsed name response for one or more names. A rather complete example would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<names xmlns="http://globalnames.org/nameparser">
    <hybridFormula verbatim="Agrostis stolonifera L.   ×   Polypogon  monspeliensis (L.) Desf." normalised="Agrostis stolonifera L. × Polypogon monspeliensis (L.) Desf.">
        <scientificName verbatim="Agrostis stolonifera L." normalised="Agrostis stolonifera L." canonical="Agrostis stolonifera">
            <genus>Agrostis</genus>
            <species epitheton="stolonifera" authorship="L.">
                <combinationAuthorteam authorteam="L.">
                    <author>L.</author>
                </combinationAuthorteam>
            </species>
        </scientificName>
        <scientificName verbatim="Polypogon  monspeliensis (L.) Desf." normalised="Polypogon monspeliensis (L.) Desf." canonical="Polypogon  monspeliensis">
            <genus>Polypogon</genus>
            <species epitheton="monspeliensis" authorship="(L.) Desf.">
                <combinationAuthorteam authorteam="Desf.">
                    <author>Desf.</author>
                </combinationAuthorteam>
                <basionymAuthorteam authorteam="L.">
                    <author>L.</author>
                </basionymAuthorteam>
            </species>
        </scientificName>        
    </hybridFormula>
    <scientificName verbatim="alberta Carla Brunii Maxtor (L.)" unparsable="true"/>
    <scientificName verbatim="Fagus sylvatica subsp. orientalis (Lipsky) Greuter &amp; Burdet var. syrica Greuter &amp; Burdet">
        <genus authorship="authorship7">fdsfds</genus>
        <infraspecies rank="subsp" authorship="(Lipsky) Greuter &amp; Burdet">
            <combinationAuthorteam authorteam="Greuter &amp; Burdet">
                <author>Greuter</author>
                <author>Burdet</author>
            </combinationAuthorteam>
            <basionymAuthorteam authorteam="Lipsky">
                <author>Lipsky</author>
            </basionymAuthorteam>
        </infraspecies>
        <infraspecies rank="var" authorship="Greuter &amp; Burdet">
            <combinationAuthorteam authorteam="Greuter &amp; Burdet">
                <author>Greuter</author>
                <author>Burdet</author>
            </combinationAuthorteam>
        </infraspecies>
    </scientificName>
</names>

Response JSON

The attribute or element names of the xml response can also be used to serialise the data as a JSON response.

The above example would look like this:

Dima: I modified JSON example(not touching XML yet) compromising OOP approach for the convenience of uniform data access. The emphasis here is on decreasing amount of logic in scripts which consume JSON format (assuming most people need simpler information like genus epithet, species epithet, species authorship, canonical form etc, so these components are placed at the same levels of the structure when they are available)

[
{
    "scientificName":
    {
        "parsed": true,
        "verbatim": "Agrostis stolonifera L.   ×   Polypogon  monspeliensis (L.) Desf.",
        "normalised": "Agrostis stolonifera L. × Polypogon monspeliensis (L.) Desf.",
        "hybrid": true,
        "details":
        [
        {
            "verbatim": "Agrostis stolonifera L.",
            "normalised": "Agrostis stolonifera L.",
            "canonical": "Agrostis stolonifera",
            "genus":
            {
                "epitheton": "Agrostis"
            },
            "species":
            {
                "epitheton": "stolonifera",
                "authorship": "L.",
                "combinationAuthorteam":
                {
                    "authorteam": "L.",
                    "author": ["L."]
                }
            }
        },
        {
            "verbatim": "Polypogon  monspeliensis (L.) Desf.",
            "normalised": "Polypogon monspeliensis (L.) Desf.",
            "canonical": "Polypogon monspeliensis",
            "genus":
            {
                "epitheton": "Polypogon"
            },
            "species":
            {
                "epitheton": "monspeliensis",
                "authorship": "(L.) Desf.",
                "combinationAuthorteam":
                {
                    "authorteam": "Desf.",
                    "author": ["Desf."]
                },
                "basionymAuthorteam":
                {
                    "authorteam": "L.",
                    "author": ["L."]
                }
            }
        }
        ]
    }
},
{
    "scientificName":
    {
        "verbatim": "alberta Carla Brunii Maxtor (L.)",
        "parsed": false,
        "details": []
    }
},
{
    "scientificName":
    {
        "parsed": true,
        "verbatim": "Pseudocercospora",
        "normalized": "Pseudocercospora",
        "canonical": "Pseudocercospora",
        "details":
        [
        {
           "uninomial":
            {
                "epitheton": "Pseudocercospora",
                "authorship": "Speg. 1910",
                "basionymAuthorTeam":
                {
                    "authorTeam": "Speg.",
                    "author": ["Speg."],
                    "year": "1910"
                }
            }
        }
        ]
    }
},
{
    "scientificName":
    {
        "parsed": true,
        "verbatim": "Fagus sylvatica subsp. orientalis (Lipsky) Greuter & Burdet var. syrica Greuter & Burdet",
        "normalised": "Fagus sylvatica subsp. orientalis (Lipsky) Greuter & Burdet var. syrica Greuter & Burdet",
        "canonical": "Fagus sylvatica syrica",
        "details": [
        {
            "genus":
            {
                "epitheton": "Fagus"
            },
            "species":
            {
                "epitheton": "sylvatica"
            },
            "infraspecies": [
            {
                "epitheton": "orientalis",
                "rank": "subsp",
                "authorship": "(Lipsky) Greuter & Burdet",
                "combinationAuthorteam":
                {
                    "authorteam": "Greuter & Burdet",
                    "author": ["Greuter", "Burdet"]
                },
                "basionymAuthorteam":
                {
                    "authorteam": "Lipsky",
                    "author": ["Lipsky"]
                }
            },
            {
                "epitheton": "syrica",
                "rank": "var",
                "authorship": "Greuter & Burdet",
                "combinationAuthorteam":
                {
                    "authorteam": "Greuter & Burdet",
                    "author": ["Greuter", "Burdet"]
                }
            }
            ]
        }
        ]
    }
}
];

Sign in to add a comment
Powered by Google Project Hosting