My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
DataConversion  
sub-project to develop conversion utilities for current data formats to RDA-based formats
Updated Feb 4, 2010

Introduction

This is a sub-project to develop conversion utilities for bibliographic data formats e.g. MARC and MODS to RDA-based formats.

The goal for the moment is to design a transformation from MARC to an RDF-based representation, using the RDA and FRBR schemas, building on work done on cataloger scenarios by the DCMI/RDA task group. This transformation is being tested on a bibliographic dataset from the Library of Congress.

Work Plan

Roughly speaking, the work breaks down a described in the phases 1-5 below. (MilestoneOne is a slice through phases 1-4 for the most abundant patterns in the LoC dataset.)

Phase 0 -- Feasibility

See DataConversionFeasibility.

Get an overview of the LoC dataset to see which patterns make up the majority (80% or so) of the data, and which are minority edge cases.

Try out some XSLTs over the data to find scaling limits.

Phase 1 -- Design

See DataConversionDesign.

Design the expected outcome of the transformation for a set of sample bibliographic records from the LoC dataset.

Phase 2 -- Implementation

See DataConversionImplementation.

Implement an XSLT transformation either directly from MARC21 XML to RDF/XML, or from MODS XML to RDF/XML. (This will probably be from MODS as it is easier to code to on a short time frame.)

Phase 3 -- Large Scale Data Conversion

See DataConversionDump.

Use the transformation to generate a complete dump of all the LoC data as RDF/XML or some other RDF syntax e.g. N-Triples. Make the dump(s) publicly available.

Phase 4 -- SPARQL

See DataConversionSparql.

Deploy a SPARQL endpoint for the LoC data.

Phase 5 -- Linked Data

See DataConversionLod.

Deploy a linked data service for the LoC data.


Sign in to add a comment
Powered by Google Project Hosting