Many gadgets interact with XML data. They retrieve RSS/ATOM feeds or read in custom XML data from disk. Parsing this data isn't very difficult with JavaScript, however there are some challenges that developers often overlook. Specifically, XML can be malformed and thus can't be parsed. Also sometimes developers expect certain fields to be present when in fact they aren't. Each of these cases takes a small amount of handling, and that handling can greatly improve the robustness of your gadget in the face of XML from the web.
At its simplest, we wish to extract certain elements from the XML (say, items from an RSS feed) and process the components of these elements. Here is a part of an example feed from the UC Berkeley Calendar of Events.
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>UC Berkeley Events Calendar</title>
<link>http://events.berkeley.edu/index.php/calendar.html</link>
<description>Campus-wide event listings from the University of California, Berkeley</description>
<item>
<title>Exhibit: Artifacts from "the Land of the Rajas"</title>
<link>http://events.berkeley.edu/index.php/calendar/sn/pubaff.html?event_ID=772&date=2007-06-13</link>
<description>Rajasthan, a desert state in northwestern India, is famed for its colorful and distinctive art styles. For centuries, its princely rulers (rajputs, literally ?sons of kings?) have encouraged a wide range of arts.
Most of the spectacular works in "From the Land of the Rajas: Creativity in Rajasthan" have been exhibited only once in the museum?s 100-year history. The 150 objects on display include domestic crafts, wedding textiles, festival material, puppets and theatrical costumes, ritual masks, musical instruments, and paintings for traveling storytellers. Many pieces chosen are uncommon in American collections because of their large size and ritual use. One of the focal points is a 30-foot-long painted scroll depicting the epic of Pabuji, a semi-divine folk hero. Traditionally the tale of Pabuji is told during a 36-hour performance given by professional storytellers, usually a married couple called a Bopa and Bopi. Video excerpts of such a performance augment the viewing of this visually exciting piece.</description>
<pubDate>Wed, 13 June 2007 0:00:00</pubDate>
<guid>http://events.berkeley.edu/index.php/calendar/sn/pubaff.html?event_ID=772&date=2007-06-13</guid>
</item>
<item>
<title>Kunstkammer</title>
<link>http://events.berkeley.edu/index.php/calendar/sn/pubaff.html?event_ID=1933&date=2007-06-13</link>
<description>An eccentric array of prints, drawings, and photographs from the BAM collections fills the Theater Gallery, with works by Albrecht Durer, Paul Gauguin, Joan Miro, Pablo Picasso, Eva Hesse, Jay DeFeo, Brice Marden, Joe Brainard, and many others hung together in the style of a sixteenth-century Kunstkammer -- an "art chamber" offering objects of scholarship and wonder.</description>
<pubDate>Wed, 13 June 2007 0:00:00</pubDate>
<guid>http://events.berkeley.edu/index.php/calendar/sn/pubaff.html?event_ID=1933&date=2007-06-13</guid>
</item>
<item>
...
</channel>
</rss>
There are two things we might want to extract from here: the channel information and the items. To get the title of the first item, we'd like a simple one-liner such as:
var title = items[0]['title'];
A class that supports this simple interface is what we're aiming for in this article.
Below I give code for the SimpleXmlParser. I've marked the interesting points in the code with numbers, and will discuss those points below.
function SimpleXmlParser(xmlDoc) { // (1) pass in the DOMDocument (either created manually or from an XMLHttpRequest.responseXML)
this.xmlDoc = xmlDoc;
this.parseError = xmlDoc.parseError; // (2) output any parse errors and make them available to client
if (this.parseError.errorCode != 0)
debug.error("SimpleXmlParser ERROR: " + this.parseError.reason);
}
SimpleXmlParser.prototype.getItems = function(key) {
var xmlDoc = this.xmlDoc;
var items = [];
if (this.parseError.errorCode != 0) {
debug.error("SimpleXmlParser ERROR: " + this.parseError.reason);
} else {
var objNodeList = xmlDoc.getElementsByTagName(key); // (3) get the specific tags the user wants
for (var i = 0; i < objNodeList.length; ++i) {
var xmlItem = objNodeList.item(i);
var item = {};
var added = false;
for (var j = 0; j < xmlItem.childNodes.length; ++j) {
var child = xmlItem.childNodes.item(j);
if (child.childNodes.length > 0) { // (4) pull out the text for the children of the main tag
var name = child.nodeName;
var value = child.childNodes[0].nodeValue;
item[name] = value;
added = true;
}
}
if (added) {
items.push(item);
}
}
}
return items;
}
SimpleXmlParser takes in a DOMDocument (1) that could come from many sources.
In the constructor we also determine if there are any parsing errors, and output the error to the debug console if so (2).
It then checks for parsing errors and writes them to the debug console (2).
The key part of the code is in the getItems function, which takes in the name of the XML node that you want to collect.
For example, for the feed given above, if we were to set key = "item" (3), we'd get back all the "items" in the XML document.
Notice section (4), which extracts the actual text of the child nodes of an item. This ensures that the text exists before trying to fetch it, avoiding a common error.
Let's look at an example usage of SimpleXmlParser:
var output = '';
function onOpen() {
// fetch an RSS feed and parse it
// NOTE: we do a blocking request here for simplicity, but you should NEVER do this in
// a real gadget, since it will block all other requests and freeze the sidebar.
var request = new XMLHttpRequest();
request.open("GET", 'http://events.berkeley.edu/index.php/rss/sn/pubaff/type/day/tab/all_events.html', false);
request.send();
function displayItems(theItems) {
output += '\\ngot ' + theItems.length + ' results. Parse errors: ' + parser.parseError.reason;
for (var i = 0; i < theItems.length; ++i) {
//alert(output);
for (var key in theItems[i]) {
output += '\\nItem ' + i + ' has name:value pair ' + key + ': ' + theItems[i][key];
}
}
}
if (request.status == 200) {
var parser = new SimpleXmlParser(request.responseXml); // (1)
// get the channel information
var items = parser.getItems("channel");
displayItems(items);
// get the items
items = parser.getItems("item");
displayItems(items);
}
// now parse some malformed XML from text
var xmlDoc=new DOMDocument();
xmlDoc.async = false;
xmlDoc.loadXML("<name><first-name>Miki<last-name>Azuma</last-name></name>");
var parser = new SimpleXmlParser(xmlDoc);
// since the data is malformed, the parser will output a debug.trace error and return
// a list of 0 items. you can access the parse error information in parser.parseError
var items = parser.getItems("name"); // (2)
}
Some of the output up to (1) is given below. Notice that we get the channel information and all the items:
got 1 results. Parse errors: Item 0 has name:value pair title: UC Berkeley Events Calendar Item 0 has name:value pair link: http://events.berkeley.edu/index.php/calendar.html?sid= Item 0 has name:value pair description: Campus-wide event listings from the University of California, Berkeley Item 0 has name:value pair item: null got 12 results. Parse errors: Item 0 has name:value pair title: Exhibit: Artifacts from "the Land of the Rajas" Item 0 has name:value pair link: http://events.berkeley.edu/index.php/calendar/sn/pubaff.html?sid=?event_ID=772&date=2007-06-13 Item 0 has name:value pair description: Rajasthan, a desert state in northwestern India, is famed for its colorful and distinctive art styles. For centuries, its princely rulers (rajputs, literally ?sons of kings?) have encouraged a wide range of arts.
Finally, when we parse malformed XML in (2), the parser simply complains in the debug console and returns an empty array. Here is the error:
1:41:06.506 - ERROR: SimpleXmlParser ERROR: End tag 'name' does not match the start tag 'first-name'.
The SimpleXmlParser handles two errors that can be missed when parsing XML: dealing with malformed XML and properly handling elements that may have no text associated with them.
It doesn't handle more complicated XML such as extracting tag attribute information (for example, pulling out the data attribute text in <tag data="text here"></tag>).
It also doesn't dig beyond the immediate children of the tag you're looking for. However, for many applications this will be enough.