|
DwCAdapters
Command-line scripts for processing existing data export files
This page has been published but is subject to revision. IntroductionOBJECTIVES : Create shell-executable scripts that convert a reference source database export file to a DarwinCore (DwC) Archive file. Many online taxonomic databases currently provide regular structured exports of their data files. Each dataset, however, is output in a different data format. This inhibits integration and the development of tools and services that can act on the all. This project is developing specific data transformation scripts that will convert the data files from their source formats to the GNA data format.
Reference source database export files ITIS (http://www.itis.gov/downloads/ for SQL or http://www.itis.gov/customdownload.html for a configurable text dump.)
Tree of Life (http://tolweb.org/data/tolskeletaldump.zip File ~400KB)
USDA Plants (http://plants.usda.gov/java/downloadData?fileName=plantlst.txt&static=true)
GRIN (http://www.ars-grin.gov/misc/tax/)
NCBI Taxonomy (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip)
Catalogue of Life (Source Files Webpage)
PalaeoBiology Database (http://www.paleodb.org/) PHP Classes Classes will be created to generate meta.xml files for the DwC Archive and extensions used. This will simplify the meta creation process and allow the classes to be reused in the future. SourceThe source scripts are located on this site. Implementation Requirementsmysql database needs to be installed on ECAT server to run scripts for ITIS and the PBDB | |