|
DataConversion
sub-project to develop conversion utilities for current data formats to RDA-based formats
IntroductionThis is a sub-project to develop conversion utilities for bibliographic data formats e.g. MARC and MODS to RDA-based formats. The goal for the moment is to design a transformation from MARC to an RDF-based representation, using the RDA and FRBR schemas, building on work done on cataloger scenarios by the DCMI/RDA task group. This transformation is being tested on a bibliographic dataset from the Library of Congress. Work PlanRoughly speaking, the work breaks down a described in the phases 1-5 below. (MilestoneOne is a slice through phases 1-4 for the most abundant patterns in the LoC dataset.) Phase 0 -- FeasibilitySee DataConversionFeasibility. Get an overview of the LoC dataset to see which patterns make up the majority (80% or so) of the data, and which are minority edge cases. Try out some XSLTs over the data to find scaling limits. Phase 1 -- DesignSee DataConversionDesign. Design the expected outcome of the transformation for a set of sample bibliographic records from the LoC dataset. Phase 2 -- ImplementationSee DataConversionImplementation. Implement an XSLT transformation either directly from MARC21 XML to RDF/XML, or from MODS XML to RDF/XML. (This will probably be from MODS as it is easier to code to on a short time frame.) Phase 3 -- Large Scale Data ConversionSee DataConversionDump. Use the transformation to generate a complete dump of all the LoC data as RDF/XML or some other RDF syntax e.g. N-Triples. Make the dump(s) publicly available. Phase 4 -- SPARQLSee DataConversionSparql. Deploy a SPARQL endpoint for the LoC data. Phase 5 -- Linked DataSee DataConversionLod. Deploy a linked data service for the LoC data. |