|
PublicationFormat
Documenting recommended publication formats
This page is under construction and subject to significant revision. IntroductionThe GBIF Global Names Architecture work of the ECAT work programme includes extending GBIF indices to include the occurrence of taxonomic names within publications. Our work involves extending the functionality of the uBioRSS application by incorporating advances in Taxonomic Name Recognition, undertaken as part of the GNA work, with advances in processing data indices. In addition we will expand the scope of indexed content. Our goal is to:
RationaleThe GBIF Strategic Plan targets the integration of indexed biodiversity data for all groups of organisms and that the amount and richness of the data served via these indexes are sufficient to meet needs of major user groups. Data types targeted for integration include genetic resources, multi-media data, and literature. The indexing of taxonomic names occurring within publications provides the integrative capacity to meet this target. Publisher RequirementsPublications to be indexed should preferably be pdfs, but for GBIF indexing purposes we support most common formats, i.e. pdf, html, xml, microsoft office (doc,xls,ppt) and more. To ease the discovery of online publications we recommend that publishers provide ideally 2 things, see below for more details on each of them
RSS feedsBy providing an Rss feed for articles/publications it is possible to include metadata for each feed item/entry, i.e. article or publication. The downside of classical RSS feeds is that they only provide access to the last published articles by default - usually around 10-25 (there are extensions to feeds in Atom for example that allow paging, but that is not widely used and gets tricky). Unfortunately there is no single reference/citation standard in use, so there are various ways of expressing publication metadata. The most common ones are using simple Dublin Core, the more detailed ones Content and Prism, which only exists as RDF and therefore is limited to RSS1.0 . Good best practice guidelines can be found here: Good Practice Guidelines for Publishers of TOC RSS feeds: http://web.fumsi.com/go/article/share/3356 The recommendation is to use rdf based RSS1.0 with PRISM if possible. As RSS2 is less expressive it should only be used when no resources to provide RSS1.0 are existing. Publisher RSS surveyWhat do publishers do already? Many publishers support the idea of TOC rss feeds and also link to pdfs from there. A good review of what people do currently can be found in Analysing the ticTOCs collection of journal TOC feeds We did an anaylsis of 980 biologically relevant feeds in ubio to see what formats are the most common ones (the missing feeds are broken ones): rss_0.92 = 3 rss_1.0 = 336 rss_2.0 = 431 rss_0.91U = 6 atom_1.0 = 2 ... and a more detailed breakdown by namespaces and elements used in feeds. Numbers indicate the number of feeds found that make use of the element in their items: 2 atom_1.0 2 atom_1.0::http://purl.org/dc/elements/1.1/ 1 atom_1.0::http://purl.org/syndication/thread/1.0total 1 atom_1.0::http://www.w3.org/200 6 rss_0.91U 6 rss_0.91U::http://purl.org/dc/elements/1.1/ 2 rss_0.91U::http://rssnamespace.org/feedburner/ext/1.0origLink 3 rss_0.92 3 rss_0.92::http://purl.org/dc/elements/1.1/ 336 rss_1.0 1 rss_1.0::http://base.google.com/ns/1.0image_link 1 rss_1.0::http://base.google.com/ns/1.0news_source 1 rss_1.0::http://base.google.com/ns/1.0publication_name 1 rss_1.0::http://base.google.com/ns/1.0publication_volume 1 rss_1.0::http://base.google.com/ns/1.0publish_date 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/byteCount 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/category 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/complianceProfile 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/copyright 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/coverDate 2 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/coverDisplayDate 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/distributor 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/eIssn 182 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/endingPage 173 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/isPartOf 56 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/issn 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/issueIdentifier 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/issueName 177 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/number 57 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/publicationDate 66 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/publicationName 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/publicationYear 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/publisher 26 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/section 238 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/startingPage 1 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/teaser 54 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/versionidentifier 233 rss_1.0::http://prismstandard.org/namespaces/1.2/basic/volume 334 rss_1.0::http://purl.org/dc/elements/1.1/ 1 rss_1.0::http://purl.org/dc/terms/created 1 rss_1.0::http://purl.org/dc/terms/issued 1 rss_1.0::http://purl.org/dc/terms/tableOfContents 1 rss_1.0::http://purl.org/rss/1.0/modules/aggregation/source 1 rss_1.0::http://purl.org/rss/1.0/modules/aggregation/sourceURL 1 rss_1.0::http://purl.org/rss/1.0/modules/aggregation/timestamp 1 rss_1.0::http://purl.org/rss/1.0/modules/annotate/reference 38 rss_1.0::http://purl.org/rss/1.0/modules/prism/endingPage 38 rss_1.0::http://purl.org/rss/1.0/modules/prism/number 39 rss_1.0::http://purl.org/rss/1.0/modules/prism/publicationDate 39 rss_1.0::http://purl.org/rss/1.0/modules/prism/section 38 rss_1.0::http://purl.org/rss/1.0/modules/prism/startingPage 38 rss_1.0::http://purl.org/rss/1.0/modules/prism/volume 1 rss_1.0::http://purl.org/rss/1.0/modules/slash/comments 1 rss_1.0::http://purl.org/rss/1.0/modules/slash/department 1 rss_1.0::http://purl.org/rss/1.0/modules/slash/hit_parade 1 rss_1.0::http://purl.org/rss/1.0/modules/slash/section 1 rss_1.0::http://purl.org/syndication/thread/1.0total 3 rss_1.0::http://rssnamespace.org/feedburner/ext/1.0origLink 54 rss_1.0::http://web.resource.org/cc/license 1 rss_1.0::http://www.openurl.info/registry/fmt/xml/rss10/ctxobjects 144 rss_1.0::http://xmlns.com/foaf/0.1/maker 1 rss_1.0::www.refworks.com/xml/created 1 rss_1.0::www.refworks.com/xml/do 1 rss_1.0::www.refworks.com/xml/id 1 rss_1.0::www.refworks.com/xml/jo 1 rss_1.0::www.refworks.com/xml/k1 1 rss_1.0::www.refworks.com/xml/modified 1 rss_1.0::www.refworks.com/xml/ol 1 rss_1.0::www.refworks.com/xml/rwtype 1 rss_1.0::www.refworks.com/xml/sn 1 rss_1.0::www.refworks.com/xml/sr 1 rss_1.0::www.refworks.com/xml/ul 431 rss_2.0 2 rss_2.0::http://prismstandard.org/namespaces/1.2/basic/endingPage 2 rss_2.0::http://prismstandard.org/namespaces/1.2/basic/number 2 rss_2.0::http://prismstandard.org/namespaces/1.2/basic/publicationDate 2 rss_2.0::http://prismstandard.org/namespaces/1.2/basic/section 2 rss_2.0::http://prismstandard.org/namespaces/1.2/basic/startingPage 2 rss_2.0::http://prismstandard.org/namespaces/1.2/basic/volume 398 rss_2.0::http://purl.org/dc/elements/1.1/ 1 rss_2.0::http://purl.org/rss/1.0/modules/slash/comments 6 rss_2.0::http://rssnamespace.org/feedburner/ext/1.0origLink 3 rss_2.0::http://search.yahoo.com/mrss/content 3 rss_2.0::http://search.yahoo.com/mrss/credit 1 rss_2.0::http://search.yahoo.com/mrss/description 2 rss_2.0::http://search.yahoo.com/mrss/thumbnail 1 rss_2.0::http://search.yahoo.com/mrss/title 1 rss_2.0::http://search.yahoo.com/mrssthumbnail 2 rss_2.0::http://wellformedweb.org/CommentAPI/commentRss 1 rss_2.0::http://www.itunes.com/dtds/podcast-1.0.dtdduration 1 rss_2.0::http://www.itunes.com/dtds/podcast-1.0.dtdexplicit 1 rss_2.0::http://www.itunes.com/dtds/podcast-1.0.dtdkeywords 1 rss_2.0::http://www.itunes.com/dtds/podcast-1.0.dtdsubtitle 1 rss_2.0::http://www.itunes.com/dtds/podcast-1.0.dtdsummary 1 rss_2.0::http://www.pheedo.com/namespace/pheedoorigLink 1 rss_2.0::http://www.topix.com/partners/rsscomment/comments RSS 1.0 with PrismRecommended Format
PRISMThe Publishing Requirements for Industry Standard Metadata (PRISM) specification defines a standard for interoperable content description, interchange, and reuse in both traditional and electronic publishing contexts. PRISM recommends the use of certain existing standards, such as XML, RDF, the Dublin Core, and various ISO specifications for locations, languages, and date/time formats. Beyond those recommendations, it defines a small number of XML namespaces and controlled vocabularies of values, in order to meet the goals listed above. Example<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns="http://purl.org/rss/1.0/">
<channel rdf:about="http://www.nature.com/ng/current_issue/rss">
<title>Nature Genetics</title>
<description>Publishes the very highest quality research in genetics.</description>
<link>http://www.nature.com/ng/current_issue/</link>
<dc:publisher>Nature Publishing Group</dc:publisher>
<dc:language>en</dc:language>
<dc:rights>© 2009 Nature Publishing Group</dc:rights>
<prism:publicationName>Nature Genetics</prism:publicationName>
<prism:issn>1061-4036</prism:issn>
<prism:eIssn>1546-1718</prism:eIssn>
<prism:copyright>© 2009 Nature Publishing Group</prism:copyright>
<prism:rightsAgent>permissions@nature.com</prism:rightsAgent>
<image rdf:resource="http://www.nature.com/includes/rj_globnavimages/ng_logo.gif"/>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://dx.doi.org/10.1038/ng0609-635"/>
...
</rdf:Seq>
</items>
</channel>
<item rdf:about="http://dx.doi.org/10.1038/ng0609-635">
<title>The cup half empty</title>
<link>http://dx.doi.org/10.1038/ng0609-635</link>
<description>One-sixth of the world's population does not have enough food to sustain life,
and the world's food supply needs to double by 2050 without increasing demand for water or fuel.
Agricultural genetics is one of the easier parts of the solution.</description>
<content:encoded><![CDATA[
<p>
<b>The cup half empty</b>
</p>
<p>Nature Genetics 41, 635 (2009). <a href="http://dx.doi.org/10.1038/ng0609-635">doi:10.1038/ng0609-635</a>
</p>
<p>One-sixth of the world's population does not have enough food to sustain life,
and the world's food supply needs to double by 2050 without increasing demand for water or fuel.
Agricultural genetics is one of the easier parts of the solution.</p>
]]></content:encoded>
<dc:title>The cup half empty</dc:title>
<dc:identifier>doi:10.1038/ng0609-635</dc:identifier>
<dc:source>Nature Genetics 41, 635 (2009)</dc:source>
<prism:publicationName>Nature Genetics</prism:publicationName>
<prism:volume>41</prism:volume>
<prism:number>6</prism:number>
<prism:section>Editorial</prism:section>
<prism:startingPage>635</prism:startingPage>
<prism:endingPage>635</prism:endingPage>
</item>
...The RSS1.0 feed is rdf based and as such the list of items can reference the individual item (see rdf:Seq above). RSS 2.0<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>BioOne: BIOTROPICA: Table of Contents</title>
<link>http://www.bioone.org/loi/bitr?ai=tc&af=R</link>
<description>Table of Contents for BIOTROPICA. List of articles from both the latest and ahead of print issues.</description>
<language>en-US</language>
<pubDate>Thu, 14 May 2009 04:17:05 GMT</pubDate>
<docs>http://blogs.law.harvard.edu/tech/rss</docs>
<generator>Atypon Literatum</generator>
<managingEditor>helpdesk@allenpress.com</managingEditor>
<ttl>120</ttl>
<image>
<title>BIOTROPICA</title>
<url>http://www.bioone.org/na101/home/literatum/publisher/bioone/journals/covergifs/bitr/2004/00063606-36.4/cover.jpg</url>
<link>http://www.bioone.org/loi/bitr?ai=tc&af=R</link>
</image>
<item>
<title>Beyond Paradise—Meeting the Challenges in Tropical Biology in the 21st Century</title>
<link>http://www.bioone.org/doi/abs/10.1646/1609?ai=tc&af=R</link>
<description>BIOTROPICA, Volume 36, Issue 4, Page 437-446, December 2004.
<br/>
</description>
<author>helpdesk@allenpress.com (Kamaljit S. Bawa et al)</author>
<category>article</category>
<pubDate>Wed, 14 Jan 2009 16:55:33 GMT</pubDate>
<guid>http://www.bioone.org/doi/abs/10.1646/1609?ai=tc&af=R</guid>
<comments>http://www.bioone.org/action/showMessage?message=Copyright+%28c%29+2009%2C+Atypon+Systems.+All+rights+reserved&ai=tc&af=R</comments>
</item>
...
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>Pubget: latest:Nature Genetics</title>
<link>http://pubget.com/search?q=latest%3ANature+Genetics</link>
<description>Pubget is like PubMed, except you get the PDFs right away</description>
<item>
<title>Mutations in mitochondrial carrier family gene SLC25A38 cause nonsyndromic autosomal recessive congenital sideroblastic anemia.</title>
<link>http://pubget.com/search?highlight=19412178&q=latest%3ANature+Genetics</link>
<description>
The sideroblastic anemias are a heterogeneous group of congenital and acquired hematological disorders whose morphological hallmark
is the presence of ringed sideroblasts-bone marrow erythroid precursors containing pathologic iron deposits within mitochondria.
Here, by positional cloning, we define a previously unknown form of autosomal recessive nonsyndromic congenital sideroblastic anemia,
associated with mutations in the gene encoding the erythroid specific mitochondrial carrier family protein SLC25A38, and demonstrate that SLC25A38 is important for the biosynthesis of heme in eukaryotes.
Authors: <a href='/search?q=authors%3A%22Duane L Guernsey%22' >Duane L Guernsey</a>,
<a href='/search?q=authors%3A%22Haiyan Jiang%22' >Haiyan Jiang</a>, <a href='/search?q=authors%3A%22Dean R Campagna%22' >
Dean R Campagna</a>, <a href='/search?q=authors%3A%22Susan C Evans%22' >Susan C Evans</a>,
<a href='/search?q=authors%3A%22Meghan Ferguson%22' >Meghan Ferguson</a>, <a href='/search?q=authors%3A%22Mark D Kellogg%22' >
Mark D Kellogg</a>, <a href='/search?q=authors%3A%22Mathieu Lachance%22' >Mathieu Lachance</a>,
<a href='/search?q=authors%3A%22Makoto Matsuoka%22' >Makoto Matsuoka</a>, <a href='/search?q=authors%3A%22Mathew Nightingale%22' >
Mathew Nightingale</a>, <a href='/search?q=authors%3A%22Andrea Rideout%22' >Andrea Rideout</a>,
<a href='/search?q=authors%3A%22Louis Saint-Amant%22' >Louis Saint-Amant</a>, <a href='/search?q=authors%3A%22Paul J Schmidt%22' >
Paul J Schmidt</a>, <a href='/search?q=authors%3A%22Andrew Orr%22' >Andrew Orr</a>,
<a href='/search?q=authors%3A%22Sylvia S Bottomley%22' >Sylvia S Bottomley</a>, <a href='/search?q=authors%3A%22Mark D Fleming%22' >
Mark D Fleming</a>, <a href='/search?q=authors%3A%22Mark Ludman%22' >Mark Ludman</a>,
<a href='/search?q=authors%3A%22Sarah Dyack%22' >Sarah Dyack</a>, <a href='/search?q=authors%3A%22Conrad V Fernandez%22' >
Conrad V Fernandez</a> and <a href='/search?q=authors%3A%22Mark E Samuels%22' >Mark E Samuels</a></description>
<guid>http://pubget.com/search?highlight=19412178&q=latest%3ANature+Genetics</guid>
<pdf>http://www.nature.com/ng/journal/v41/n6/pdf/ng.359.pdf</pdf>
</item>
...AtomEven though Atom technically is a very good standard, the lack of use by current publishers suggests to better not use it at this point. ArchiveThe archive of all publications should be a list of dublin core records. There are 2 ways of encoding such an archive, a simple CSV text file or XML CSV archiveA CSV file with each row representing a single publication. This format is very simple to produce and is compatible with the darwin core text guidelines, in particular the ECAT references extension. It does not allow for line breaks in the metadata - something common in abstracts. If you dont have abstracts or can replace the line breaks, please consider this format. A simple example file with 1 record looks like this: dc:identifier link dc:bibliographicCitation dc:title dc:creator dc:date dc:source dc:subject dc:description doi:10.1038/ng0609-637 Hartge, P., Genetics of reproductive lifespan. Nature Genetics 41, 637 - 638 (2009) Genetics of reproductive lifespan Patricia Hartge 2009-06-01 Nature Genetics 41, 635 (2009) genomics, epidemiology Five genome-wide association studies of the timing of menarche and menopause have now taken us beyond the range of candidate gene and linkage studies. The list of new genetic associations identified for these two traits should shed light on the mechanisms of ovarian aging, as well as breast cancer and other diseases associated with reproductive lifespan. ... XML archiveThe same informations as the CSV file, but encoded as XML which allows for linebreaks and markup within the abstracts. A simple xml schema is provided to validate resources encoded in Dublin Core alone. Example: <?xml version="1.0" encoding="UTF-8"?>
<resources xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xsi:noNamespaceSchemaLocation="http://gbif-ecat.googlecode.com/files/publication_archive.xsd">
<resource>
<dc:identifier>doi:10.1038/ng0609-637</dc:identifier>
<dc:identifier>http://www.nature.com/ng/journal/v41/n6/pdf/ng0609-637.pdf</dc:identifier>
<dc:title>Genetics of reproductive lifespan</dc:title>
<dc:creator>Patricia Hartge</dc:creator>
<dc:date>2009-06-01</dc:date>
<dc:source>Nature Genetics 41, 635 (2009)</dc:source>
<dc:subject>genomics; epidemiology</dc:subject>
<dc:language>en</dc:language>
<dc:rights>Copyright © 2009 Wiley-Liss, Inc., A Wiley Company</dc:rights>
<dc:description>
Five genome-wide association studies of the timing of menarche and menopause have now taken us beyond the range of candidate gene and linkage studies.
The list of new genetic associations identified for these two traits should shed light on the mechanisms of ovarian aging, as well as breast cancer and other diseases associated with reproductive lifespan.
</dc:description>
</resource>
...
</resources>
| |
taken from fumsi.com