|
IntroductionToRdfa
An introduction to RDFa.
Featured IntroductionAdding metadata to web-pages is both powerful and convenient. It is powerful, because it provides extra information that can not only be used by search engines to improve search results, but can also be leveraged by browers to enhance the user experience. And it is convenient since it does not rely on having additional server components, but instead the metadata can be added to static pages, blog entries, pages generated by systems such as Drupal, and so on. In short, we can use existing web-page publishing techniques to make our metadata available. In this introduction we'll look at some basic guidelines for adding the metadata, and then provide links to other tutorials that will show how to add specific types of information. Why embed metadata?It's always been possible to embed metadata in HTML pages. For example, in each Directgov page, the publisher is indicated like this: <html>
<head>
...
<meta name="DC.publisher"
content="Directgov, Hercules House, 6 Hercules Road, London, SE1 7DU. helpdesk@directgov.gsi.gov.uk" />
...
</head>
...
</html>This makes use of the Dublin Core publisher 'property', and systems such as search engines are able to make use of this information. However, there is often a lot more metadata in a web-page than is placed in the document's head. For example, newspaper articles often look something like this (from the Times Online): <div>September 17, 2008</div> <h1>BAA pledges to keep Stansted after Gatwick sale</h1> <span>Carl Mortished, World Business Editor</span> <p> Gatwick Airport has been put up for sale by BAA, the owner of London's three major airports, which today pledged to continue its fight against a decision by the Competition Commission that the company should sell a second London airport. </p> Here we see the name of the author, their job title, the date of publication of the article, and the article's title. This is useful metadata that could be used by search engines and browsers, and the aim of embedded metadata techniques is to make this information available. Setting propertiesThe simplest way to embed metadata is to use the @property attribute to indicate the nature of a piece of information. In our example above the candidates for becoming properties are 'date of publication', 'name of the author', 'title of the article', and so on. Let's assume that we have the widely used Dublin Core terms available to us (more on how we actually do that, below); this would mean that we can use the dc:title property to set the title of the news story, as follows: <h1 property="dc:title">BAA pledges to keep Stansted after Gatwick sale</h1> Note that since the value has been added as an attribute then there will be no effect on how this this web-page appears in a browser. But what has changed is that we have now clearly indicated to any application that can make use of it, that the h1 element doesn't just contain a string of text, it contains the actual title of the article. Further, since this title is indicated by using a combination of the @property attribute and the text inside the element, then it will be unaffected by minor changes to the markup, such as adding other attributes (like @class) or changing the element name: <div property="dc:title" class="articleTitle">BAA pledges to keep Stansted after Gatwick sale</div> Equally, it won't matter whether other elements are introduced as siblings, such as a sub-title, or links to other articles; the title value will still be understood. Machine-readable dataAlong with dc:title, Dublin Core also provides a way to indicate when something was created -- the dc:created term. Using the same approach as we did above, for the title of the article, we could use the text above the article title as the publication date, as follows: <div property="dc:created">September 17, 2008</div> One problem with this is that the format is not necessarily recognisable by computer systems, and since there are a number of standard ways to write dates, it would be better to use one of those. One way to do this would be to change the date format used in the article itself: <div property="dc:created">2008-09-17</div> but then we would be asking our human readers to act like computers. Far better to attach the machine-readable version of the date to the markup: <div property="dc:created" content="2008-09-17">September 17, 2008</div> Note that the date is now available to some processing application, independently of the text that is visible to human readers: <div property="dc:created" content="2008-09-17">Today</div> <div property="dc:created" content="2008-09-17">September 17th</div> <div property="dc:created" content="2008-09-17">Wednesday, September 17th, 2008</div> Which means that a standard format can be used, even with news stories written in other languages: <div property="dc:created" content="2008-09-17" lang="it">17 Settembre 2008</div> <div property="dc:created" content="2008-09-17" lang="ru">17 сентября 2008</div> Setting the datatypeThis kind of additional information will often be enough for the applications that will be built to use the data. However, there will also be situations where the 'type' of the information needs to be more precise. When this is the case, we can use @datatype. In this example we are indicating that the created property represents a date: <div property="dc:created" datatype="xs:date" content="2008-09-17">September 17, 2008</div> whilst here we are indicating that the created property is a date, time and timezone: <div property="dc:created" datatype="xs:dateTime" content="2008-09-17T09:00Z">September 17, 2008</div> Adding extra elementsSince embedding metadata in this way requires the use of attributes, there will be situations that require extra elements to be created, so that we have something to attach to. An example is the byline for the author of the article, which includes both the name of the journalist, as well as their title: <span>Carl Mortished, World Business Editor</span> In order to use the Dublin Core dc:creator property, we need to create an element onto which we can place the @property value; the result looks like this: <span><span property="dc:creator">Carl Mortished</span>, World Business Editor</span> Indicating which terms are availableIn our examples we've been using terms from the Dublin Core list, but we haven't shown how to make that list available for use. There is only one step required, and that is to add a declaration to the top of the HTML document, as follows: <html xmlns:dc="http://purl.org/dc/terms/"> ... </html> The simple principle is as follows. First, the long URL within the quotes identifies some list of terms that we want to use in our documents: http://purl.org/dc/terms/ To make this list available we need to both reference it and give it a name. This we do with the xmlns mechanism: xmlns:dc="http://purl.org/dc/terms/" Finally, we add this declaration to the root of the document: <html xmlns:dc="http://purl.org/dc/terms/"> ... </html> Now we can use the 'name' to refer to terms within the list, without having to worry that terms from different lists may be confused with each other: <h1 property="dc:title">BAA pledges to keep Stansted after Gatwick sale</h1> <span><span property="foaf:title">Mr.</span> Smith</span> Read moreFor a detailed description of RDFa's features see RDFa Primer, by Ben Adida and Mark Birbeck. For an introduction to the features in video form, see RDFa Basics, by Manu Sporny. For a slideshow see The 5 minute guide to RDFa...in only 6 minutes and 40 seconds, by Mark Birbeck. To see how to add Argots using RDFa, see GettingStartedEmbeddedHtml. |