This document describes how to fetch and manipulate remote textual (typically HTML), XML, and RSS/Atom feed data.
One of the most exciting features available to gadgets is the ability to combine information from multiple sources in new ways, or to provide alternative ways to interact with existing information. The Google Gadgets API allows your gadget to remotely fetch content from other web servers and web pages and operate on it.
The Google Gadgets API provides the following functions for retrieving and operating on remote web content:
The _IG_FetchContent() and _IG_FetchXmlContent()functions
also optionally take a refreshInterval parameter
for controlling how often content is refreshed. This topic is discussed
in Refreshing
the Cache.
Note: You cannot use the _IG_Fetch... functions with type="url" gadgets.
The _IG_Fetch... functions share the following characteristics:
For example, consider the following code snippet, which features the _IG_FetchContent() function. This code fetches the HTML text of the google.com web page, and pops up a browser alert that contains the first 400 characters of the google.com HTML that was returned:
_IG_FetchContent('http://www.google.com/', function (responseText) {
// print the first 400 characters of Google's homepage HTML
alert(responseText.substr(0,400));
});
This example illustrates the basic principles behind how all of the _IG_Fetch... functions work:
What is a callback function? For the purpose of this discussion, the simplest way to describe a callback is to say that it's a function that is passed as a parameter (in the form of a function reference) to another function. Callbacks give third-party developers a "hook" into a running framework to do some processing. All of the _IG_Fetch... functions take callbacks as parameters.
Most of the examples in this section show the callback as a function literal (that is, as an anonymous function that is implemented within the body of the outer _IG_Fetch... function). For example:
_IG_FetchContent('http://www.google.com/', function (responseText) {
// do something
});
But making the callback a function literal is not a requirement. If you prefer, you can implement the callback as a named function. For example, in this snippet, the callback is broken out into a separate function called my_callback_function():
function my_callback_function(responseText) { if (responseText == null) return; alert(responseText.substr(0,400)); } // Here the callback is invoked by name _IG_FetchContent('http://www.google.com/', my_callback_function);
There may be situations where you want to pass extra parameters to the callback function. To make this possible, the Google Gadgets API provides an _IG_Callback(callback, ...) wrapper that lets you add parameters of any number or type to the callback. These extra parameters appear after any parameters the callback function would normally receive. For example, in the code sample below, responseText (the content data) is the first parameter to the callback function. With _IG_Callback, any extra parameters (in this example, limit) appear after the fist parameter (in this example, responseText). The _IG_Callback wrapper can be used in any place that is expecting a function reference for use as a callback function. For example:
// Here my_callback_function takes an additional 'limit' parameter, which // specifies the upper range of the substring to be displayed in the alert // panel. function my_callback_function(responseText, limit) { if (responseText == null) return; alert(responseText.substr(0, limit)); } // You use the _IG_Callback wrapper to provide additional parameters. // In this example, '400' is passed as a parameter to indicate the upper // limit for the substring extraction. _IG_FetchContent('http://www.google.com/', _IG_Callback(my_callback_function, 400));
The most general function for working with remote web content is _IG_FetchContent(url, func). It returns the content of the remote website as text.
You could use this function to perform dynamic retrieval of HTML and write sophisticated interfaces for viewing it. Or, you could write a gadget that fetches results from a search engine for a given query, and then lets you know if there is a change in the order of the top results.
The previous section showed several variations on a simple example that used _IG_FetchContent(). Here is another example that fetches data from a CSV (comma-separated value) file stored on Google Page Creator and uses it to populate a list of personal contacts:
// This example fetches data from a CSV file containing contact information. In the CSV file,
// each record consists of a name, email address, and phone number.
_IG_FetchContent('http://doc.examples.googlepages.com/Contacts.csv', function (responseText) {
// Set CSS for div.
var html = "<div style='padding: 5px;background-color: #FFFFBF;font-family:Arial, Helvetica;"
+ "text-align:left;font-size:90%'>";
// Use the split function to extract substrings separated by comma
// delimiters.
var contacts = responseText.split(",");
// Process array of extracted substrings.
for (var i = 0; i < contacts.length ; i++) {
// Append substrings to html.
html += contacts[i];
html += " ";
// Each record consists of 3 components: name, email, and
// phone number. The gadget displays each record on a single
// line:
//
// Mickey Mouse mickey@disneyland.com 1-800-MYMOUSE
//
// Therefore, insert a line break after each (name,email,phone)
// triplet (i.e., whenever (i+1) is a multiple of 3).
if((i+1)%3 ==0) {
html += "<br>";
}
}
html += "</div>";
// Output html in div.
_gel('content_div').innerHTML = html;
});
For a more complex example, see onebox.xml. This example illustrates how you can have multiple requests running at once. In the onebox.xml example, if you enter "_test", it runs a self-test that fires off multiple queries to google.com. The query results render in whatever order the results come back.
There is no return value from _IG_FetchContent() because as described above, it returns immediately, and its associated function gets called whenever the response returns. You can have multiple requests running at once. For example, in the onebox.xml example, if you enter "_test", it runs a self-test that fires off multiple queries to google.com. The query results render in whatever order the results come back. The "onebox.xml" gadget provides a good example of how to use this function.
The Document Object Model (DOM) is an API for navigating HTML and XML documents. You can use the Google Gadgets JavaScript function _IG_FetchXmlContent(url, func) to retrieve an XML document as a DOM object. Once you have the object, you can operate on it using standard DOM JavaScript functions. Typically this means extracting the desired data from the XML file, combining it with HTML and CSS markup, and rendering the resulting HTML in your gadget.
Note: You can only use _IG_FetchXmlContent() to retrieve XML files, not HTML. To work with HTML, use the _IG_FetchContent() function.
With DOM, web content is parsed into a tree of nodes. For example, consider the following snippet of HTML:
<a href="http://www.google.com/">Google's <b>fast</b> home page.</a>
This snippet illustrates the main types of nodes discussed in this section:
This is the DOM structure for the HTML snippet:
To access the data in a DOM object, you “walk the tree,” using DOM functions to navigate parent-child node relationships to get to the data you need.
The following XML file contains data for a series of breakfast items. The top-most parent node is menu, and it has multiple food child nodes. The menu node also contains an attribute node: title="Breakfast Menu". Each food node has name, price, description, and calories child nodes.
The name, price, and calories nodes all contain their own "text" child nodes. Each description node contains a CDATA child node. CDATA is a distinct node type. CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup, such as angle brackets. The only delimiter that is recognized in a CDATA section is the “]]>” string that ends the CDATA section.
<?xml version="1.0" encoding="UTF-8" ?>
<menu title="Breakfast Menu">
<food>
<name>Early Bird Breakfast</name>
<price>$3.95</price>
<description><![CDATA[<div style="color:purple; padding-left:25px;">Two eggs any style with your choice of bacon
or sausage, toast or English muffin.</div>]]></description>
<calories>450</calories>
</food>
<food>
<name>Chocolate Chip Belgian Waffles</name>
<price>$7.95</price>
<description><![CDATA[<div style="color:purple; padding-left:25px;">Chocolate chip Belgian waffles covered with
chocolate syrup and whipped cream.</div>]]></description>
<calories>900</calories>
</food>
…
</menu>
The following sample gadget uses this XML file as a data source. It displays a breakfast menu and lets users set a calorie limit. It displays in red any calories that are above the specified limit. Users can also choose whether or not to display descriptions for each breakfast item.
This gadget uses the _IG_FetchXmlContent() function to retrieve the breakfast XML file as a DOM tree. Like the _IG_FetchContent() function, _IG_FetchXmlContent() is asynchronous, meaning that it returns immediately and then calls the inner function later, when the fetch finishes. This means that you must put any dependent code inside the callback function, or inside functions called by the callback function. In the following example, all processing happens inside the callback function. Here is the gadget:
The following code illustrates how to walk the DOM tree to extract data from different node types, and how to combine the data with HTML and CSS markup for display in the breakfast menu gadget.
<?xml version="1.0" encoding="UTF-8" ?>
<Module>
<ModulePrefs
title="_IG_FetchXmlContent Example"
scrolling="true"/>
<UserPref
name="mycalories"
display_name="Calorie limit"
default_value="800"/>
<UserPref
name="mychoice"
display_name="Show Descriptions"
datatype="bool"
default_value="false"/>
<Content type="html">
<![CDATA[
<div id="content_div"></div>
<script type="text/javascript">
function displayMenu() {
// XML breakfast menu data
var url = "http://doc.examples.googlepages.com/breakfast-data.xml";
var prefs = new _IG_Prefs();
// Calorie limit set by user
var calorieLimit = prefs.getString("mycalories");
// Indicates whether to show descriptions in the breakfast menu
var description = prefs.getBool("mychoice");
_IG_FetchXmlContent(url, function (response) {
if (response == null || typeof(response) != "object" ||
response.firstChild == null) {
_gel("content_div").innerHTML = "<i>Invalid data.</i>";
return;
}
// Start building HTML string that will be displayed in <div>.
// Set the style for the <div>.
var html = "<div style='padding: 5px;background-color: #ccf;font-family:Arial, Helvetica;" +
"text-align:left;font-size:90%'>";
// Set style for title.
html +="<div style='text-align:center; font-size: 120%; color: yellow; " +
"font-weight: 700;'>";
// Display menu title. Use getElementsByTagName() to retrieve the <menu> element.
// Since there is only one menu element in the file,
// you can get to it by accessing the item at index "0".
// You can then use getAttribute to get the text associated with the
// menu "title" attribute.
var title = response.getElementsByTagName("menu").item(0).getAttribute("title");
// Alternatively, you could retrieve the title by getting the menu element node
// and calling the "attributes" function on it. This returns an array
// of the element node's attributes. In this case, there is only one
// attribute (title), so you could display the value for the attribute at
// index 0. For example:
//
// var title = response.getElementsByTagName("menu").item(0).attributes.item(0).nodeValue;
// Append the title to the HTML string.
html += title + "</div><br>";
// Get a list of the <food> element nodes in the file
var itemList = response.getElementsByTagName("food");
// Loop through all <food> nodes
for (var i = 0; i < itemList.length ; i++) {
// For each <food> node, get child nodes.
var nodeList = itemList.item(i).childNodes;
// Loop through child nodes. Extract data from the text nodes that are
// the children of the associated name, price, and calories element nodes.
for (var j = 0; j < nodeList.length ; j++) {
var node = nodeList.item(j);
if (node.nodeName == "name") {
var name = node.firstChild.nodeValue;
}
if (node.nodeName == "price") {
var price = node.firstChild.nodeValue;
}
if (node.nodeName == "calories") {
var calories = node.firstChild.nodeValue;
}
// If the user chose to display descriptions and
// the child node is "#cdata-section", grab the
// contents of the description CDATA for display.
if (node.nodeName == "description" && description==true)
{
if (node.firstChild.nodeName == "#cdata-section")
var data = node.firstChild.nodeValue;
}
}
// Append extracted data to the HTML string.
html += "<i><b>";
html += name;
html += "</b></i><br>";
html += " ";
html += price;
html += " - ";
// If "calories" is greater than the user-specified calorie limit,
// display it in red.
if(calories > calorieLimit) {
html += "<font color=#ff0000>";
html += calories + " calories";
html += " </font>";
}
else
html += calories + " calories";
html += "<br>";
// If user has chosen to display descriptions
if (description==true) {
html += "<i>" + data + "</i><br>";
}
}
// Close up div
html += "</div>";
// Display HTML string in <div>
_gel('content_div').innerHTML = html;
});
}
_IG_RegisterOnloadHandler(displayMenu);
</script>
]]>
</Content>
</Module>
This code sample illustrates four of the primary functions you use to interact with DOM data:
This example only shows a few of the different functions for navigating a DOM tree. Some of the others you might try include lastChild, nextSibling, previousSibling, and parentNode.
The key to working effectively with DOM is appreciating the sometimes subtle differences between different node types.
| Node Type | Description | Return Values | Gotchas |
|---|---|---|---|
| element | The structural building blocks of a document, such as <p> , <b> , or <calories>. | nodeName:
Whatever text is contained inside the angle brackets. For
example, the nodeName of <menu> is “menu”. nodeType: 1 nodeValue : null |
An element has a nodeValue of null. To get to the value of a text or attribute node associated with an element, you must go to those nodes. For example: element.firstChild.nodeValue for text, and element.getAttribute(attrib) for attributes. |
text |
Text. A text node is always contained within an element. It is a child of the element. | nodeName: #text nodeType: 3 nodeValue: Whatever text is contained in the node. |
Some browsers render all whitespace in a document as text nodes, so that you get “empty” text nodes in your DOM object. This can cause unexpected results when you’re walking the tree. The solution may be as simple as filtering out text nodes that contain only the newline character, or you may want to do more robust handling. For more discussion of this topic, see Whitespace in the DOM. |
| attribute | A key-value pair that provides additional information about an element node (for example, title=”my document”). An attribute is contained by an element node, but it is not a child of the element node. | nodeName:
The lefthand value in the attribute pair. If the attribute
is title=”my document”,
the nodeName is title. nodeType: 2 nodeValue: The righthand value in the attribute pair (in this example, “my document”). |
Even though attributes are nodes and are contained within element nodes, they are not child nodes of the element. They inherit from the Node interface, but the DOM doesn't consider them part of the DOM tree. This means that while you can use many of the node functions on attribute nodes (such as nodeName, nodeValue, and nodeType), you cannot access attribute nodes using the DOM tree-walking functions. To access attributes, you use the functions attributes and getAttribute(attrib). |
| CDATA | A section in which content is left alone, not interpreted. CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the CDATA section. | nodeName: #cdata-section nodeType: 4 nodeValue: Text and markup inside the CDATA delimiters. |
The text in the CDATA section has its own markup. This could have implications for how you incorporate it into your gadget. |
You can add a feed to your iGoogle page by typing its URL into the Add by URL form in the content directory. This uses the Google Gadgets API built-in feed support to create a gadget for the feed and add it to iGoogle. It's easy to use, but it doesn't let you perform any customization to the content or display. Also, you can't use it with other Google properties.
For more sophisticated feed handling, the Google Gadgets API provides the _IG_FetchFeedAsJSON(url, func, num_entries, get_summaries) function. _IG_FetchFeedAsJSON() fetches an RSS or Atom feed and returns the core feed data as a JSON object. JSON (JavaScript Object Notation) is a simple way of describing data as JavaScript.
_IG_FetchFeedAsJSON() takes the following parameters:
| Name | Data Type | Description |
|---|---|---|
| url | string, required | The RSS or Atom feed URL to retrieve. |
| callback | function, required | The callback function to execute when the data is retrieved. |
| num_entries | integer, optional | The number of feed entries to retrieve from the feed. The accepted range is 1 through 100. The default is 3. |
| get_summaries | boolean, optional | Whether to retrieve the full text summaries for the entries in the feed. This defaults to false. You should only set this to true if you plan to use the data. The full summaries can be quite large and shouldn't be transferred needlessly. |
Here are the fields in the JSON feed object:
| Field | Description |
|---|---|
| ErrorMsg | If defined, describes any error that occurred. |
| URL | The URL of the RSS / Atom feed. |
| Title | The title of the feed. |
| Description | The tagline or description of the feed. |
| Link | Typically, the URL of the feed homepage. |
| Author | The author of the feed. |
| Entry | Array of feed entries. The following fields are nested
within the Entry:
|
The following example illustrates how to use the _IG_FetchFeedAsJSON() function to fetch a feed and display portions of its data in a gadget. Here is the running gadget. It lets users specify:
The live gadget with preferences set:
This is the code for the example:
<?xml version="1.0" encoding="UTF-8" ?>
<Module>
<ModulePrefs
title="_IG_FetchFeedAsJSON Example"
title_url="http://groups.google.com/group/Google-Gadgets-API" />
<UserPref name="show_date" display_name="Show Dates?" datatype="bool"/>
<UserPref name="show_summ" display_name="Show Summaries?" datatype="bool"/>
<UserPref name="num_entries" display_name="Number of Entries:" />
<Content type="html">
<![CDATA[
<style> #content_div { font-size: 80%; margin: 5px; background-color: #FFFFBF;} </style>
<div id=content_div></div>
<script type="text/javascript">
// Get userprefs
var prefs = new _IG_Prefs();
var showdate = prefs.getBool("show_date");
var summary = prefs.getBool("show_summ");
var entries = prefs.getInt("num_entries");
// If user wants to display more than 100 entries, display an error
// and set the value to 100, the max allowed.
if (entries > 100)
{
alert("You cannot display more than 100 entries.");
entries = 100;
}
// Use the _IG_FetchFeedAsJSON() function to retrieve core feed data from
// the specified URL. Then combine the data with HTML markup for display in
// the gadget.
_IG_FetchFeedAsJSON("http://groups.google.com/group/Google-Gadgets-API/feed/rss_v2_0_msgs.xml",
function(feed) {
if (feed == null){
alert("There is no data.");
return;
}
// Start building HTML string that will be displayed in gadget.
var html = "";
// Access the fields in the feed
html += "<div><b>" + feed.Title + "</b></div>";
html += "<div>" + feed.Description + "</div><br>";
// Access the data for a given entry
if (feed.Entry) {
for (var i = 0; i < feed.Entry.length; i++) {
html += "<div>"
+ "<a target='_blank' href='" + feed.Entry[i].Link + "'>"
+ feed.Entry[i].Title
+ "</a> ";
if (showdate==true)
{
// The feed entry Date field contains the timestamp in seconds
// since Jan. 1, 1970. To convert it to the milliseconds needed
// to initialize the JavaScript Date object with the correct date,
// multiply by 1000.
var milliseconds = (feed.Entry[i].Date) * 1000;
var date = new Date(milliseconds);
html += date.toLocaleDateString();
html += " ";
html += date.toLocaleTimeString();
}
if (summary==true) {
html += "<br><i>" + feed.Entry[i].Summary + "</i>";
}
html += "</div>";
}
}
_gel("content_div").innerHTML = html;
// The rest of the function parameters, which are optional: the number
// of entries to return, and whether to return summaries.
}, entries, summary);
</script>
]]>
</Content>
</Module>
If you are using _IG_FetchContent(), _IG_FetchXmlContent(), or
the _IG_Get... functions
to fetch content that is updated more than once an hour, such as
feed data, you might not get the latest updates. This is because
Google caches results to make your gadget run faster. If you want
to be sure that your gadget has the latest data, you can use the refreshInterval parameter
to bypass the cache and force a refresh to happen within the interval
you specify. In other words, the cache is refreshed
every X seconds, where X = refreshInterval.
This feature is not implemented for the _IG_FetchFeedAsJSON() function.
To make sure your gadget fetches fresh new content at least
once per interval, simply specify a value (measured in seconds) for
the refreshInterval parameter. For example:
// Fetch fresh content every half hour
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: (60 * 30) });
// Fetch fresh content every 10 minutes
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: (60 * 10) });
// Fetch fresh content every 30 seconds
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: 30 });
// Disable caching completely and fetch fresh content every time -- !! Try to avoid using this !!
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: 0 });
function callback(response) { ... }
Caching serves a useful purpose, and you should be careful not to refresh
the cache so often that you degrade performance. Caching
makes fetching data faster. It also reduces the load
on third-party servers hosting the remote content. You
should try to avoid disabling the cache completely (which you would
do by using
refreshInterval: 0). If your gadget is
getting millions of page views a day, sending out millions of requests
to these servers, turning off caching could not only
adversely affect your gadget's
performance, but it could overload the servers that provide your gadget
with data.
Since content is refreshed by default every hour, it only
makes sense to specify an interval less than an hour. The recommended
range is for refreshInterval is more than 60, and
less than 3600.