My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
DwCAdapters  
Command-line scripts for processing existing data export files
Updated Mar 6, 2010 by mikegidd...@silverbiology.com

This page has been published but is subject to revision.

Introduction

OBJECTIVES : Create shell-executable scripts that convert a reference source database export file to a DarwinCore (DwC) Archive file.

Many online taxonomic databases currently provide regular structured exports of their data files. Each dataset, however, is output in a different data format. This inhibits integration and the development of tools and services that can act on the all.

This project is developing specific data transformation scripts that will convert the data files from their source formats to the GNA data format.

  1. DarwinCore terms located at http://rs.tdwg.org/dwc/terms/index.htm
These scripts are referred to in the context of this project as Darwin Core Adapters or DwC Adapters. They can be written in any programming language so long as they can be executed via a Unix command-line.

Reference source database export files

ITIS (http://www.itis.gov/downloads/ for SQL or http://www.itis.gov/customdownload.html for a configurable text dump.)
- Meta File
- Current DwC Archive ~11MB

Tree of Life (http://tolweb.org/data/tolskeletaldump.zip File ~400KB)
- Meta File
- Currect DwC Archive ~14KB

USDA Plants (http://plants.usda.gov/java/downloadData?fileName=plantlst.txt&static=true)
- Meta File
- Current DwC Archive [File ~3.5MB]

GRIN (http://www.ars-grin.gov/misc/tax/)
- Meta File
- Current DwC Archive ~4MB

NCBI Taxonomy (ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip)
- Meta File
- Current DwC File ~11MB

  • Taxonomic Hierarchy & Synonyms
  • Vernacular Names

Catalogue of Life (Source Files Webpage)

  • Taxonomic Hierarchy
  • Synonyms
  • Vernacular Names
  • Distribution

PalaeoBiology Database (http://www.paleodb.org/)

PHP Classes

Classes will be created to generate meta.xml files for the DwC Archive and extensions used. This will simplify the meta creation process and allow the classes to be reused in the future.

Source

The source scripts are located on this site.

Implementation Requirements

mysql database needs to be installed on ECAT server to run scripts for ITIS and the PBDB


Sign in to add a comment
Powered by Google Project Hosting