My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Links

This project currently details the scalable research for improving the harvesting, processing, quality control, taxonomic and geosptial integration and indexing required to offer global biodiversity data search, browse and access services in the GBIF data portal.

Companies such as Google, Yahoo and Amazon have realised infrastructure capable of rapidly indexing and mining huge volumes of data in a manner tolerant of hardware failure and accommodating continued growth. Until recently, the underlying technologies were a closely guarded secret, but emerging open source projects such as Hadoop, !HBase and Hypertable allow for cost effective research into the applicability of these technologies for the discovery and scientific analysis of biodiversity data.

The primary goals are to

  • Reduce latency between data change and global discovery
    • Increase in harvesting throughput
    • Reduction in processing time
    • Reduction in index rollover periods
  • Increase richness of the indexed terms
    • Measured against the number of terms in the DarwinCore
  • Provide enhanced web services for better integration
  • Enhanced visualisation services

The general research architecture is depicted:


YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications.

Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler.

Powered by Google Project Hosting