This document describes how to fetch and manipulate remote textual (typically HTML), XML, and RSS/Atom feed data.
One of the most exciting features available to gadgets is the ability to combine information from multiple sources in new ways, or to provide alternative ways to interact with existing information. The gadgets API allows your gadget to remotely fetch content from other web servers and web pages and operate on it.
The gadgets API provides the following functions for retrieving and operating on remote web content:
_IG_FetchContent(url,
func) Returns
and operates on the content at url as
text. This is the function you should use for working with
HTML content._IG_FetchXmlContent(\url,
func) Returns and
operates on the XML content at url as
a DOM object. _IG_FetchFeedAsJSON(url,
func, num_entries, get_summaries) Returns and
operates on the feed content at url in
JSON format. The _IG_FetchContent() and _IG_FetchXmlContent()functions
also optionally take a refreshInterval parameter
for controlling how often content is refreshed. This topic is discussed
in Refreshing
the Cache.
Note: You cannot use the _IG_Fetch... functions
with type="url" gadgets.
The _IG_Fetch... functions share the
following characteristics:
For example, consider the following code snippet, which features
the _IG_FetchContent() function.
This code fetches the HTML text of the google.com web
page, and pops up a browser alert that contains the first 400 characters
of the google.com HTML that was returned:
_IG_FetchContent('http://www.google.com/', function (responseText) {
// print the first 400 characters of Google's homepage HTML
alert(responseText.substr(0,400));
});
This example illustrates the basic principles behind how all of
the _IG_Fetch... functions work:
_IG_FetchContent() is called,
the gadgets API makes an asynchronous HTTP GET request
to the URL passed into the function (in this example, the URL
is http://www.google.com). _IG_FetchContent() returns immediately
and then calls the inner callback function later, when the
fetch finishes. This means that you must put any dependent
code inside the callback function, or inside functions called
by the callback function. _IG_FetchContent() returns the
HTTP response text as a parameter to the callback function (or
an empty string in case of error).What is a callback
function? For the purpose of this discussion, the simplest
way to describe a callback is to say that it's a function that
is passed as a parameter (in the form of a function reference)
to another function. Callbacks give third-party developers
a "hook" into a running framework to do some processing.
All of the _IG_Fetch... functions
take callbacks as parameters.
Most of the examples in this section show the callback as a function
literal (that is, as an anonymous
function that is implemented within the body of the outer _IG_Fetch... function).
For example:
_IG_FetchContent('http://www.google.com/', function (responseText) {
// do something
});
But making the callback a function literal is not a requirement.
If you prefer, you can implement the callback as a named function.
For example, in this snippet, the callback is broken out into a
separate function called my_callback_function():
function my_callback_function(responseText) { if (responseText == null) return; alert(responseText.substr(0,400)); } // Here the callback is invoked by name _IG_FetchContent('http://www.google.com/', my_callback_function);
There may be situations where you want to pass extra parameters
to the callback function. To make this possible, the gadgets
API provides an _IG_Callback(callback, ...) wrapper
that lets you add parameters of any number or type to the callback.
These extra parameters appear after any parameters the callback
function would normally receive. For example, in the code sample
below, responseText (the content
data) is the first parameter to the callback function. With _IG_Callback,
any extra parameters (in this example, limit)
appear after the fist parameter (in this example, responseText).
The _IG_Callback wrapper can be used
in any place that is expecting a function reference for use as
a callback function. For example:
// Here my_callback_function takes an additional 'limit' parameter, which
// specifies the upper range of the substring to be displayed in the alert
// panel.
function my_callback_function(responseText, limit) {
if (responseText == null) return;
alert(responseText.substr(0, limit));
}
// You use the _IG_Callback wrapper to provide additional parameters.
// In this example, '400' is passed as a parameter to indicate the upper
// limit for the substring extraction.
_IG_FetchContent('http://www.google.com/', _IG_Callback(my_callback_function, 400));
The most general function for working with remote web content is _IG_FetchContent(url,
func). It returns the content of the remote website
as text.
You could use this function to perform dynamic retrieval of HTML and write sophisticated interfaces for viewing it. Or, you could write a gadget that fetches results from a search engine for a given query, and then lets you know if there is a change in the order of the top results.
The previous section showed several variations
on a simple example that used _IG_FetchContent().
Here is another example that fetches data from a CSV (comma-separated
value) file and uses it to populate a list of personal contacts:
// This example fetches data from a CSV file containing contact information. In the CSV file,
// each record consists of a name, email address, and phone number.
_IG_FetchContent('http://doc.examples.googlepages.com/Contacts.csv', function (responseText) {
// Set CSS for div.
var html = "<div style='padding: 5px;background-color: #FFFFBF;font-family:Arial, Helvetica;"
+ "text-align:left;font-size:90%'>";
// Use the split function to extract substrings separated by comma
// delimiters.
var contacts = responseText.split(",");
// Process array of extracted substrings.
for (var i = 0; i < contacts.length ; i++) {
// Append substrings to html.
html += contacts[i];
html += " ";
// Each record consists of 3 components: name, email, and
// phone number. The gadget displays each record on a single
// line:
//
// Mickey Mouse mickey@disneyland.com 1-800-MYMOUSE
//
// Therefore, insert a line break after each (name,email,phone)
// triplet (i.e., whenever (i+1) is a multiple of 3).
if((i+1)%3 ==0) {
html += "<br>";
}
}
html += "</div>";
// Output html in div.
_gel('content_div').innerHTML = html;
});
For a more complex example, see onebox.xml.
This example illustrates how you can have multiple requests running
at once. In the onebox.xml example,
if you enter "_test", it runs a self-test that fires
off multiple queries to google.com.
The query results render in whatever order the results come back.
There is no return value from _IG_FetchContent() because
as described above, it returns immediately, and its associated
function gets called whenever the response returns. You can have
multiple requests running at once. For example, in the onebox.xml example,
if you enter "_test", it runs a self-test that fires
off multiple queries to google.com.
The query results render in whatever order the results come back.
The "onebox.xml" gadget provides a good example of how
to use this function.
The Document
Object Model (DOM) is an API for navigating HTML and XML
documents. You can use the gadgets JavaScript function _IG_FetchXmlContent(url,
func) to retrieve an XML document as a DOM object.
Once you have the object, you can operate on it using standard
DOM JavaScript functions. Typically this means extracting the
desired data from the XML file, combining it with HTML and
CSS markup, and rendering the resulting HTML in your gadget.
Note: You can only use _IG_FetchXmlContent() to
retrieve XML files, not HTML. To work with HTML, use the _IG_FetchContent() function.
With DOM, web content is parsed into a tree of nodes. For example, consider the following snippet of HTML:
<a href="http://www.google.com/">Google's <b>fast</b> home page.</a>
This snippet illustrates the main types of nodes discussed in this section:
“a” and “b”.
Element nodes are the building blocks that define the structure
of a document.‘Google’s’, ‘fast’,
and ‘home page.’ Text
nodes are always contained within element nodes. They are child
nodes of the containing element node. href=’http://www.google.com’.
An attribute node provides additional information about its
containing element node. However, attributes are not considered
child nodes of the element that contains them, which has implications
for how you work with them. For more discussion of this topic,
see Working With Different Node Types.This is the DOM structure for the HTML snippet:
To access the data in a DOM object, you “walk the tree,” using DOM functions to navigate parent-child node relationships to get to the data you need.
The following XML file contains data for a series of breakfast items.
The top-most parent node is menu, and
it has multiple food child nodes. The menu node
also contains an attribute node: title="Breakfast
Menu". Each food node has name, price, description,
and calories child nodes.
The name, price,
and calories nodes all contain their
own "text" child nodes. Each description node
contains a CDATA child node. CDATA is
a distinct node type. CDATA sections
are used to escape blocks of text containing characters that would
otherwise be regarded as markup, such as angle brackets. The only
delimiter that is recognized in a CDATA section
is the “]]>” string that ends the CDATA section.
<?xml version="1.0" encoding="UTF-8" ?>
<menu title="Breakfast Menu">
<food>
<name>Early Bird Breakfast</name>
<price>$3.95</price>
<description><![CDATA[<div style="color:purple; padding-left:25px;">Two eggs any style with your choice of bacon
or sausage, toast or English muffin.</div>]]></description>
<calories>450</calories>
</food>
<food>
<name>Chocolate Chip Belgian Waffles</name>
<price>$7.95</price>
<description><![CDATA[<div style="color:purple; padding-left:25px;">Chocolate chip Belgian waffles covered with
chocolate syrup and whipped cream.</div>]]></description>
<calories>900</calories>
</food>
…
</menu>
The following sample gadget uses this XML file as a data source. It displays a breakfast menu and lets users set a calorie limit. It displays in red any calories that are above the specified limit. Users can also choose whether or not to display descriptions for each breakfast item.
This gadget uses the _IG_FetchXmlContent() function
to retrieve the breakfast XML file as a DOM tree. Like the _IG_FetchContent() function, _IG_FetchXmlContent() is
asynchronous, meaning that it returns immediately and then calls
the inner function later, when the fetch finishes. This means that
you must put any dependent code inside the callback function, or
inside functions called by the callback function. In the following
example, all processing happens inside the callback function. Here
is the gadget:
The following code illustrates how to walk the DOM tree to extract data from different node types, and how to combine the data with HTML and CSS markup for display in the breakfast menu gadget.
<?xml version="1.0" encoding="UTF-8" ?>
<Module>
<ModulePrefs
title="_IG_FetchXmlContent Example"
scrolling="true"/>
<UserPref
name="mycalories"
display_name="Calorie limit"
default_value="800"/>
<UserPref
name="mychoice"
display_name="Show Descriptions"
datatype="bool"
default_value="false"/>
<Content type="html">
<![CDATA[
<div id="content_div"></div>
<script type="text/javascript">
function displayMenu() {
// XML breakfast menu data
var url = "http://doc.examples.googlepages.com/breakfast-data.xml";
var prefs = new _IG_Prefs();
// Calorie limit set by user
var calorieLimit = prefs.getString("mycalories");
// Indicates whether to show descriptions in the breakfast menu
var description = prefs.getBool("mychoice");
_IG_FetchXmlContent(url, function (response) {
if (response == null || typeof(response) != "object" ||
response.firstChild == null) {
_gel("content_div").innerHTML = "<i>Invalid data.</i>";
return;
}
// Start building HTML string that will be displayed in <div>.
// Set the style for the <div>.
var html = "<div style='padding: 5px;background-color: #ccf;font-family:Arial, Helvetica;" +
"text-align:left;font-size:90%'>";
// Set style for title.
html +="<div style='text-align:center; font-size: 120%; color: yellow; " +
"font-weight: 700;'>";
// Display menu title. Use getElementsByTagName() to retrieve the <menu> element.
// Since there is only one menu element in the file,
// you can get to it by accessing the item at index "0".
// You can then use getAttribute to get the text associated with the
// menu "title" attribute.
var title = response.getElementsByTagName("menu").item(0).getAttribute("title");
// Alternatively, you could retrieve the title by getting the menu element node
// and calling the "attributes" function on it. This returns an array
// of the element node's attributes. In this case, there is only one
// attribute (title), so you could display the value for the attribute at
// index 0. For example:
//
// var title = response.getElementsByTagName("menu").item(0).attributes.item(0).nodeValue;
// Append the title to the HTML string.
html += title + "</div><br>";
// Get a list of the <food> element nodes in the file
var itemList = response.getElementsByTagName("food");
// Loop through all <food> nodes
for (var i = 0; i < itemList.length ; i++) {
// For each <food> node, get child nodes.
var nodeList = itemList.item(i).childNodes;
// Loop through child nodes. Extract data from the text nodes that are
// the children of the associated name, price, and calories element nodes.
for (var j = 0; j < nodeList.length ; j++) {
var node = nodeList.item(j);
if (node.nodeName == "name") {
var name = node.firstChild.nodeValue;
}
if (node.nodeName == "price") {
var price = node.firstChild.nodeValue;
}
if (node.nodeName == "calories") {
var calories = node.firstChild.nodeValue;
}
// If the user chose to display descriptions and
// the child node is "#cdata-section", grab the
// contents of the description CDATA for display.
if (node.nodeName == "description" && description==true)
{
if (node.firstChild.nodeName == "#cdata-section")
var data = node.firstChild.nodeValue;
}
}
// Append extracted data to the HTML string.
html += "<i><b>";
html += name;
html += "</b></i><br>";
html += " ";
html += price;
html += " - ";
// If "calories" is greater than the user-specified calorie limit,
// display it in red.
if(calories > calorieLimit) {
html += "<font color=#ff0000>";
html += calories + " calories";
html += " </font>";
}
else
html += calories + " calories";
html += "<br>";
// If user has chosen to display descriptions
if (description==true) {
html += "<i>" + data + "</i><br>";
}
}
// Close up div
html += "</div>";
// Display HTML string in <div>
_gel('content_div').innerHTML = html;
});
}
_IG_RegisterOnloadHandler(displayMenu);
</script>
]]>
</Content>
</Module>
This code sample illustrates four of the primary functions you use to interact with DOM data:
getElementsByTagName(tagname)--
(_gelstn() is a wrapper around getElementsByTagName()).
For a DOM document, returns an array of the element nodes whose
names match tagname. You can retrieve
all of the element nodes in a file by using the wildcard character
(*), for example: response.getElementsByTagName("*").getElementById(id)-- (_gel() is
a wrapper around getElementById()).
For a DOM document, retrieves a single node by id.getAttribute(attrib)-- For an
element node, returns the attribute attrib.
For example: response.getElementsByTagName("menu").item(0).getAttribute("title").attributes -- For an element node,
returns an array of the node’s attributes.This example only shows a few of the different functions for navigating
a DOM tree. Some of the others you might try include lastChild, nextSibling, previousSibling,
and parentNode.
The key to working effectively with DOM is appreciating the sometimes subtle differences between different node types.
| Node Type | Description | Return Values | Gotchas |
|---|---|---|---|
element |
The structural
building blocks of a document, such as <p> , <b> ,
or <calories>. |
nodeName:
Whatever text is contained inside the angle brackets. For
example, the nodeName of <menu> is “menu”. nodeType:
1 nodeValue : null |
An element
has a nodeValue of null.
To get to the value of a text or attribute node associated
with an element, you must go to those nodes. For example: element.firstChild.nodeValue for
text, and element.getAttribute(attrib) for
attributes. |
text |
Text. A text node is always contained within an element. It is a child of the element. | nodeName: #textnodeType:
3nodeValue: Whatever text
is contained in the node. |
Some browsers render all whitespace in a document as text nodes, so that you get “empty” text nodes in your DOM object. This can cause unexpected results when you’re walking the tree. The solution may be as simple as filtering out text nodes that contain only the newline character, or you may want to do more robust handling. For more discussion of this topic, see Whitespace in the DOM. |
attribute |
A key-value
pair that provides additional information about an element
node (for example, title=”my
document”). An attribute is contained by an
element node, but it is not a child of the element node. |
nodeName:
The lefthand value in the attribute pair. If the attribute
is title=”my document”,
the nodeName is title.nodeType:
2nodeValue:
The righthand value in the attribute pair
(in this example, “my
document”). |
Even though
attributes are nodes and are contained within element nodes,
they are not child nodes of the element. They inherit from
the Node interface, but the
DOM doesn't consider them part of the DOM tree. This means
that while you can use many of the node functions on attribute
nodes (such as nodeName, nodeValue,
and nodeType), you cannot access
attribute nodes using the DOM tree-walking functions. To
access attributes, you use the functions attributes and getAttribute(attrib). |
CDATA |
A section in which content is left alone, not interpreted. CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the CDATA section. | nodeName: #cdata-sectionnodeType: 4nodeValue: Text and markup
inside the CDATA delimiters.
|
The text in the CDATA section has its own markup. This could have implications for how you incorporate it into your gadget. |
You can add a feed to your iGoogle page by typing its URL into the Add by URL form in the content directory. This uses the gadgets API built-in feed support to create a gadget for the feed and add it to iGoogle. It's easy to use, but it doesn't let you perform any customization to the content or display. Also, you can't use it with other Google properties.
For more sophisticated feed handling, the gadgets API provides
the _IG_FetchFeedAsJSON( function. url,
func, num_entries, get_summaries)_IG_FetchFeedAsJSON() fetches
an RSS or Atom feed and returns the core feed data as a JSON object.
JSON (JavaScript Object Notation) is a simple way of describing
data as JavaScript.
_IG_FetchFeedAsJSON() takes the following
parameters:
| Name | Data Type | Description |
|---|---|---|
url |
string, required | The RSS or Atom feed URL to retrieve. |
callback |
function, required | The callback function to execute when the data is retrieved. |
num_entries |
integer, optional | The number of feed entries to retrieve from the feed. The accepted range is 1 through 100. The default is 3. |
get_summaries |
boolean, optional | Whether to retrieve the full text summaries for the entries in the feed. This defaults to false. You should only set this to true if you plan to use the data. The full summaries can be quite large and shouldn't be transferred needlessly. |
Here are the fields in the JSON feed object:
| Field | Description |
|---|---|
ErrorMsg |
If defined, describes any error that occurred. |
URL |
The URL of the RSS / Atom feed. |
Title |
The title of the feed. |
Description |
The tagline or description of the feed. |
Link |
Typically, the URL of the feed homepage. |
Author |
The author of the feed. |
Entry |
Array of feed entries. The following fields are nested
within the Entry:
|
The following example illustrates how to use the _IG_FetchFeedAsJSON() function
to fetch a feed and display portions of its data in a gadget. Here
is the running gadget. It lets users specify:
The live gadget with preferences set:
This is the code for the example:
<?xml version="1.0" encoding="UTF-8" ?>
<Module>
<ModulePrefs
title="_IG_FetchFeedAsJSON Example"
title_url="http://groups.google.com/group/Google-Gadgets-API" />
<UserPref name="show_date" display_name="Show Dates?" datatype="bool"/>
<UserPref name="show_summ" display_name="Show Summaries?" datatype="bool"/>
<UserPref name="num_entries" display_name="Number of Entries:" />
<Content type="html">
<![CDATA[
<style> #content_div { font-size: 80%; margin: 5px; background-color: #FFFFBF;} </style>
<div id=content_div></div>
<script type="text/javascript">
// Get userprefs
var prefs = new _IG_Prefs();
var showdate = prefs.getBool("show_date");
var summary = prefs.getBool("show_summ");
var entries = prefs.getInt("num_entries");
// If user wants to display more than 100 entries, display an error
// and set the value to 100, the max allowed.
if (entries > 100)
{
alert("You cannot display more than 100 entries.");
entries = 100;
}
// Use the _IG_FetchFeedAsJSON() function to retrieve core feed data from
// the specified URL. Then combine the data with HTML markup for display in
// the gadget.
_IG_FetchFeedAsJSON("http://groups.google.com/group/Google-Gadgets-API/feed/rss_v2_0_msgs.xml",
function(feed) {
if (feed == null){
alert("There is no data.");
return;
}
// Start building HTML string that will be displayed in gadget.
var html = "";
// Access the fields in the feed
html += "<div><b>" + feed.Title + "</b></div>";
html += "<div>" + feed.Description + "</div><br>";
// Access the data for a given entry
if (feed.Entry) {
for (var i = 0; i < feed.Entry.length; i++) {
html += "<div>"
+ "<a target='_blank' href='" + feed.Entry[i].Link + "'>"
+ feed.Entry[i].Title
+ "</a> ";
if (showdate==true)
{
// The feed entry Date field contains the timestamp in seconds
// since Jan. 1, 1970. To convert it to the milliseconds needed
// to initialize the JavaScript Date object with the correct date,
// multiply by 1000.
var milliseconds = (feed.Entry[i].Date) * 1000;
var date = new Date(milliseconds);
html += date.toLocaleDateString();
html += " ";
html += date.toLocaleTimeString();
}
if (summary==true) {
html += "<br><i>" + feed.Entry[i].Summary + "</i>";
}
html += "</div>";
}
}
_gel("content_div").innerHTML = html;
// The rest of the function parameters, which are optional: the number
// of entries to return, and whether to return summaries.
}, entries, summary);
</script>
]]>
</Content>
</Module>
If you are using _IG_FetchContent(), _IG_FetchXmlContent(), or
the _IG_Get... functions
to fetch content that is updated more than once an hour, such as
feed data, you might not get the latest updates. This is because
the server caches results to make your gadget run faster. If you want
to be sure that your gadget has the latest data, you can use the refreshInterval parameter
to bypass the cache and force a refresh to happen within the interval
you specify. In other words, the cache is refreshed
every X seconds, where X = refreshInterval.
This feature is not implemented for the _IG_FetchFeedAsJSON() function.
To make sure your gadget fetches fresh new content at least
once per interval, simply specify a value (measured in seconds) for
the refreshInterval parameter. For example:
// Fetch fresh content every half hour
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: (60 * 30) });
// Fetch fresh content every 10 minutes
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: (60 * 10) });
// Fetch fresh content every 30 seconds
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: 30 });
// Disable caching completely and fetch fresh content every time -- !! Try to avoid using this !!
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: 0 });
function callback(response) { ... }
Caching serves a useful purpose, and you should be careful not to refresh
the cache so often that you degrade performance. Caching
makes fetching data faster. It also reduces the load
on third-party servers hosting the remote content. You
should try to avoid disabling the cache completely (which you would
do by using
refreshInterval: 0). If your gadget is
getting millions of page views a day, sending out millions of requests
to these servers, turning off caching could not only
adversely affect your gadget's
performance, but it could overload the servers that provide your gadget
with data.
Since content is refreshed by default every hour, it only
makes sense to specify an interval less than an hour. The recommended
range is for refreshInterval is more than 60, and
less than 3600.