|
|
Goals
The goal of iPhylo is to treat biodiversity objects as equal citizens. Each object has a unique identifier, associated metadata, and is linked to other objects (for example, a specimen is linked to sequences, sequences are linked to publications, etc.). By linking objects together we can discover new facts, as well as track the provenance of data, and ultimately build "citation networks" of specimens, sequences, etc.
For background see my paper on Biodiversity informatics: the challenge of linking data and the role of shared identifiers (doi:10.1093/bib/bbn022, preprint at Nature Precedings hdl:10101/npre.2008.1760.1), and my iPhylo blog.
iPhylo is a descendant of my bioGUID and SemAnt projects. iPhylo shares much with these projects, but drops the use of a triple store in favour of an entity-attribute-value model. Like bioGUID, iPhylo relies on a suite of web services (most external, some I've developed locally) to locate and resolve identifiers.
How does it work?
iPhylo resolves identifiers for PubMed records, GenBank sequences, museum specimens, publications, etc. and adds the associated metadata to a local database. Wherever possible it resolves any links in the metadata (e.g., if a GenBank record mentions a specimen, iPhylo will try and retrieve information on that specimen). When you view an object in iPhylo, these links are displayed. iPhylo will also try and convert bibliographic records to identifiers (such as DOIs) if no identifiers are provided, and also extracts georeferences for specimens and sequences, either from original records or by using a georeferencing service. Taxonomic names are resolved using uBio, and are treated as "tags."
Bibliographic data
Data is being harvested from DSpace repositories, such as the AMNH's digital library, using OAIHarvesting. Other sources include screen scraping Zootaxa.
