Table of Contents
Introduction
1. Overview
This GBIF Metadata Catalogue is a software component that will enable GBIF to promote and participate in the sharing of dataset-level metadata published in commonly used standards such as Ecological Metadata Language (EML) or ISO 19139. Large distributed networks such as GBIF's that bring together many providers and consumers of data can be characterised as a Service Oriented Architecture (SOA) where consumers discover providers and their services, and loosely coupled system components can be orchestrated as needed to serve particular cases. In an SOA, the key activities of inventory, discovery and access to data and services must be well coordinated through the provision of registries and metadata catalogues, and through the generation of specific indexes. Metadata are thus a central component in an expanded GBIF network.
Metadata are literally ‘data about data’. They provide information on such aspects as the ‘who, what, where, when and how’ pertaining to a resource. In the GBIF context, resources are datasets, loosely defined as collections of related data, the granularity of which is determined by the data custodian. Metadata can be considered from the perspective of both the data producer and the data consumer. For the producer, metadata are used to document data in order to inform prospective users of their characteristics, while for the consumer, metadata are used to both discover data and assess their appropriateness for particular needs – their so-called ‘fitness for purpose’.
Metadata are usually made available in two broad categories of completeness: discovery and full. Discovery level metadata typically provide a minimum of essential information to enable a user to find out if a particular dataset exists, its location and ownership, and how to obtain further information. Full metadata include additional information on such aspects as data quality and lineage (provenance) and technical details for access and exploitation. An important goal for GBIF is to develop the infrastructure needed across its network to support the management and delivery of the highest quality metadata that will enable potential end users to easily discover which datasets are available, and, critically, to evaluate the appropriateness of such datasets for particular purposes.
2. Metadata sharing network topology
The metadata catalogue will primarily be used as the central catalogue in the GBIF Data Portal for the global GBIF network, which, in turn, will broker information to wider initiatives such as EuroGEOSS . The metadata catalogue will support open data exchange protocols, in particular, the Open Archives Initiative for Metadata Harvesting (OAI-PMH), and therefore offer the possibility of integration with other metadata network catalogues such as Metacat , Mercury and Geonetwork.
3. Alternative Implementation Options
In preparation for developing the metadata catalogue, GBIF evaluated the suitability of three major metadata catalogue systems (Metacat, GeoNetwork, Mercury). The findings are available as a separate report authors. In brief, at the time of evaluation (Q4 2009), Mercury was a relatively new system and not well documented, while both Metacat and GeoNetwork offered well established systems with associated support communities. In the end, as the sophisticated functionality of these systems brought added complexity, especially when trying to integrate them with the GBIF portal code, GBIF opted, instead, to go for a simple solution based on assembling existing components such as the OAICat for harvesting / serving metadata and Apache Solr as the indexing/search engine.
4. Formats
Several metadata standards exist, either with a regional/country focus or targeted to particular communities of practice. Examples of the former include those from the Australia New Zealand Spatial Information Council (ANZLIC), Comité Européen de Normalisation (CEN) and the Federal Geographic Data Committee (FGDC), while the latter include the Dublin Core Metadata Initiative (DCMI) and Ecological Metadata Language (EML). Now, however, there is a strong promotion by most countries of the ISO 19115/19139 standards for geographic metadata, e.g., the North American Profile of ISO19115 in the USA and Canada NAP, and the INSPIRE directive in the European Union.
5. GBIF Metadata Profile
A metadata profile is a recommended subset of the elements of a metadata standard for use by a particular community of users. The GBIF Metadata Profile is aimed at standardising how resources get described at the dataset level in the GBIF Data Portal. In developing this profile, the recommendations of the GBIF Metadata Implementation Framework Task Group were taken into account MIFTG as were requirements gathered from the community GMP. GBIF decided to base their profile on EML as it offered strong expressivity for describing biodiversity resources and its design was already informed by other major standards. However, GBIF does not mandate what standard a Participant should adopt; this will often be determined at a national or thematic level. Rather, in line with the MIFTG recommendations, the GBIF catalogue will accept and store metadata in all major standards, map to a common search model while avoiding lossy conversions and always return the original metadata document. To support interoperability and searching across broader metadata networks, the GBIF profile can be transformed to other formats such as ISO19139 in the case of the EuroGEOSS broker. In some cases, depending on the original format, this may result in a reduced amount of information but it should be sufficient for high-level searches and the original metadata documents will always be available. The user is referred to the GBIF Metadata Profile Reference Guide for a description of constituent elements GMPG
6. Open source license
The GBIF metadata catalogue integrates existing open source products such as Apache Solr and is itself released under an open license. It may be used and customized but support at this point is limited to installation instructions and basic operational guidelines.