My favorites | English | Sign in

Faster apps faster - GWT 2.0 with Speed Tracer New!

Legacy Gadgets API (Deprecated)

Working with Remote Content

This document describes how to fetch and manipulate remote textual (typically HTML), XML, and RSS/Atom feed data.

Contents

  1. Introduction
    1. Callbacks
  2. Working with Text
  3. Working with XML
    1. Example
    2. Working With Different Node Types
  4. Working with Feeds
  5. Refreshing the Cache

Introduction

One of the most exciting features available to gadgets is the ability to combine information from multiple sources in new ways, or to provide alternative ways to interact with existing information. The gadgets API allows your gadget to remotely fetch content from other web servers and web pages and operate on it.

The gadgets API provides the following functions for retrieving and operating on remote web content:

The _IG_FetchContent() and _IG_FetchXmlContent()functions also optionally take a refreshInterval parameter for controlling how often content is refreshed. This topic is discussed in Refreshing the Cache.

Note: You cannot use the _IG_Fetch... functions with type="url" gadgets.

The _IG_Fetch... functions share the following characteristics:

  • Their first parameter is a URL that is used to fetch the remote content.
  • Their second parameter is a callback function that you use to process the returned data.
  • They are asynchronous, meaning that all processing must happen within the callback function.
  • They have no return values because they return immediately, and their associated callback functions get called whenever the response returns.

For example, consider the following code snippet, which features the _IG_FetchContent() function. This code fetches the HTML text of the google.com web page, and pops up a browser alert that contains the first 400 characters of the google.com HTML that was returned:

_IG_FetchContent('http://www.google.com/', function (responseText) {
   // print the first 400 characters of Google's homepage HTML
   alert(responseText.substr(0,400));
});

This example illustrates the basic principles behind how all of the _IG_Fetch... functions work:

  1. When _IG_FetchContent() is called, the gadgets API makes an asynchronous HTTP GET request to the URL passed into the function (in this example, the URL is http://www.google.com).
  2. _IG_FetchContent() returns immediately and then calls the inner callback function later, when the fetch finishes. This means that you must put any dependent code inside the callback function, or inside functions called by the callback function.
  3. _IG_FetchContent() returns the HTTP response text as a parameter to the callback function (or an empty string in case of error).
  4. The callback function performs some operations on the returned data. Typically it extracts portions of the data, combines them with HTML markup, and renders the resulting HTML in the gadget.

Callbacks

What is a callback function? For the purpose of this discussion, the simplest way to describe a callback is to say that it's a function that is passed as a parameter (in the form of a function reference) to another function. Callbacks give third-party developers a "hook" into a running framework to do some processing. All of the _IG_Fetch... functions take callbacks as parameters.

Most of the examples in this section show the callback as a function literal (that is, as an anonymous function that is implemented within the body of the outer _IG_Fetch... function). For example:

_IG_FetchContent('http://www.google.com/', function (responseText) {
   // do something
});

But making the callback a function literal is not a requirement. If you prefer, you can implement the callback as a named function. For example, in this snippet, the callback is broken out into a separate function called my_callback_function():

function my_callback_function(responseText) {
    if (responseText == null) return;
        alert(responseText.substr(0,400));
}
// Here the callback is invoked by name
_IG_FetchContent('http://www.google.com/', my_callback_function);

There may be situations where you want to pass extra parameters to the callback function. To make this possible, the gadgets API provides an _IG_Callback(callback, ...) wrapper that lets you add parameters of any number or type to the callback. These extra parameters appear after any parameters the callback function would normally receive. For example, in the code sample below, responseText (the content data) is the first parameter to the callback function. With _IG_Callback, any extra parameters (in this example, limit) appear after the fist parameter (in this example, responseText). The _IG_Callback wrapper can be used in any place that is expecting a function reference for use as a callback function. For example:

// Here my_callback_function takes an additional 'limit' parameter, which
// specifies the upper range of the substring to be displayed in the alert
// panel.
function my_callback_function(responseText, limit) {
     if (responseText == null) return;
         alert(responseText.substr(0, limit));
}
 
// You use the _IG_Callback wrapper to provide additional parameters.             
// In this example, '400' is passed as a parameter to indicate the upper      
// limit for the substring extraction.
_IG_FetchContent('http://www.google.com/', _IG_Callback(my_callback_function, 400));

Working with Text

The most general function for working with remote web content is _IG_FetchContent(url, func). It returns the content of the remote website as text.

You could use this function to perform dynamic retrieval of HTML and write sophisticated interfaces for viewing it. Or, you could write a gadget that fetches results from a search engine for a given query, and then lets you know if there is a change in the order of the top results.

The previous section showed several variations on a simple example that used _IG_FetchContent(). Here is another example that fetches data from a CSV (comma-separated value) file and uses it to populate a list of personal contacts:

// This example fetches data from a CSV file containing contact information. In the CSV file, 
// each record consists of a name, email address, and phone number.
 _IG_FetchContent('http://doc.examples.googlepages.com/Contacts.csv', function (responseText) {

     // Set CSS for div.
     var html = "<div style='padding: 5px;background-color: #FFFFBF;font-family:Arial, Helvetica;" 
+ "text-align:left;font-size:90%'>"; // Use the split function to extract substrings separated by comma // delimiters. var contacts = responseText.split(","); // Process array of extracted substrings. for (var i = 0; i < contacts.length ; i++) { // Append substrings to html. html += contacts[i]; html += " "; // Each record consists of 3 components: name, email, and // phone number. The gadget displays each record on a single // line: // // Mickey Mouse mickey@disneyland.com 1-800-MYMOUSE // // Therefore, insert a line break after each (name,email,phone) // triplet (i.e., whenever (i+1) is a multiple of 3). if((i+1)%3 ==0) { html += "<br>"; } } html += "</div>"; // Output html in div. _gel('content_div').innerHTML = html; });

For a more complex example, see onebox.xml. This example illustrates how you can have multiple requests running at once. In the onebox.xml example, if you enter "_test", it runs a self-test that fires off multiple queries to google.com. The query results render in whatever order the results come back.

There is no return value from _IG_FetchContent() because as described above, it returns immediately, and its associated function gets called whenever the response returns. You can have multiple requests running at once. For example, in the onebox.xml example, if you enter "_test", it runs a self-test that fires off multiple queries to google.com. The query results render in whatever order the results come back. The "onebox.xml" gadget provides a good example of how to use this function.

Working with XML

The Document Object Model (DOM) is an API for navigating HTML and XML documents. You can use the gadgets JavaScript function _IG_FetchXmlContent(url, func) to retrieve an XML document as a DOM object. Once you have the object, you can operate on it using standard DOM JavaScript functions. Typically this means extracting the desired data from the XML file, combining it with HTML and CSS markup, and rendering the resulting HTML in your gadget.

Note: You can only use _IG_FetchXmlContent() to retrieve XML files, not HTML. To work with HTML, use the _IG_FetchContent() function.

With DOM, web content is parsed into a tree of nodes. For example, consider the following snippet of HTML:

<a href="http://www.google.com/">Google's <b>fast</b> home page.</a>

This snippet illustrates the main types of nodes discussed in this section:

  • Element nodes. The element nodes in this snippet are “a” and “b”. Element nodes are the building blocks that define the structure of a document.
  • Text nodes. The text nodes in this snippet are ‘Google’s’, ‘fast’, and ‘home page.’ Text nodes are always contained within element nodes. They are child nodes of the containing element node.
  • Attribute nodes. This snippet has one attribute node: href=’http://www.google.com’. An attribute node provides additional information about its containing element node. However, attributes are not considered child nodes of the element that contains them, which has implications for how you work with them. For more discussion of this topic, see Working With Different Node Types.

This is the DOM structure for the HTML snippet:

DOM tree

To access the data in a DOM object, you “walk the tree,” using DOM functions to navigate parent-child node relationships to get to the data you need.

Example: Parsing an XML Data File

The following XML file contains data for a series of breakfast items. The top-most parent node is menu, and it has multiple food child nodes. The menu node also contains an attribute node: title="Breakfast Menu". Each food node has name, price, description, and calories child nodes.

The name, price, and calories nodes all contain their own "text" child nodes. Each description node contains a CDATA child node. CDATA is a distinct node type. CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup, such as angle brackets. The only delimiter that is recognized in a CDATA section is the “]]>” string that ends the CDATA section.

<?xml version="1.0" encoding="UTF-8" ?>
<menu title="Breakfast Menu">
  <food>
     <name>Early Bird Breakfast</name> 
     <price>$3.95</price> 
     <description><![CDATA[<div style="color:purple; padding-left:25px;">Two eggs any style with your choice of bacon 
or sausage, toast or English muffin.</div>]]></description> 
     <calories>450</calories> 
  </food>

  <food>
     <name>Chocolate Chip Belgian Waffles</name> 
     <price>$7.95</price> 
     <description><![CDATA[<div style="color:purple; padding-left:25px;">Chocolate chip Belgian waffles covered with 
chocolate syrup and whipped cream.</div>]]></description> 
     <calories>900</calories> 
 </food>

     …
</menu>

The following sample gadget uses this XML file as a data source. It displays a breakfast menu and lets users set a calorie limit. It displays in red any calories that are above the specified limit. Users can also choose whether or not to display descriptions for each breakfast item.

This gadget uses the _IG_FetchXmlContent() function to retrieve the breakfast XML file as a DOM tree. Like the _IG_FetchContent() function, _IG_FetchXmlContent() is asynchronous, meaning that it returns immediately and then calls the inner function later, when the fetch finishes. This means that you must put any dependent code inside the callback function, or inside functions called by the callback function. In the following example, all processing happens inside the callback function. Here is the gadget:

The following code illustrates how to walk the DOM tree to extract data from different node types, and how to combine the data with HTML and CSS markup for display in the breakfast menu gadget.

<?xml version="1.0" encoding="UTF-8" ?> 
<Module>

  <ModulePrefs 
    title="_IG_FetchXmlContent Example" 
    scrolling="true"/>
  <UserPref 
    name="mycalories" 
    display_name="Calorie limit" 
    default_value="800"/>

  <UserPref 
    name="mychoice" 
    display_name="Show Descriptions" 
    datatype="bool" 
    default_value="false"/>
  <Content type="html">

  <![CDATA[ 
  <div id="content_div"></div>
  <script type="text/javascript"> 
  function displayMenu() { 
    // XML breakfast menu data
    var url = "http://doc.examples.googlepages.com/breakfast-data.xml"; 
    var prefs = new _IG_Prefs();
    // Calorie limit set by user
    var calorieLimit = prefs.getString("mycalories");
    // Indicates whether to show descriptions in the breakfast menu    
    var description = prefs.getBool("mychoice");
 
    _IG_FetchXmlContent(url, function (response) {
           if (response == null || typeof(response) != "object" || 
                      response.firstChild == null) {
              _gel("content_div").innerHTML = "<i>Invalid data.</i>";
              return;
           }

           // Start building HTML string that will be displayed in <div>.           
           // Set the style for the <div>.		
           var html = "<div style='padding: 5px;background-color: #ccf;font-family:Arial, Helvetica;" +                   
		          "text-align:left;font-size:90%'>";   
					    
           // Set style for title.
           html +="<div style='text-align:center; font-size: 120%; color: yellow; " +
		          "font-weight: 700;'>"; 

           // Display menu title. Use getElementsByTagName() to retrieve the <menu> element.
           // Since there is only one menu element in the file,
           // you can get to it by accessing the item at index "0". 
           // You can then use getAttribute to get the text associated with the
           // menu "title" attribute.
           var title = response.getElementsByTagName("menu").item(0).getAttribute("title");
 
           // Alternatively, you could retrieve the title by getting the menu element node
           // and calling the "attributes" function on it. This returns an array
           // of the element node's attributes. In this case, there is only one
           // attribute (title), so you could display the value for the attribute at
           // index 0. For example:
           // 
           // var title = response.getElementsByTagName("menu").item(0).attributes.item(0).nodeValue; 

           // Append the title to the HTML string.
           html += title + "</div><br>"; 

           // Get a list of the <food> element nodes in the file
           var itemList = response.getElementsByTagName("food");
 
           // Loop through all <food> nodes
           for (var i = 0; i < itemList.length ; i++) { 
             // For each <food> node, get child nodes.
             var nodeList = itemList.item(i).childNodes;

             // Loop through child nodes. Extract data from the text nodes that are
             // the children of the associated name, price, and calories element nodes.
             for (var j = 0; j < nodeList.length ; j++) {
                var node = nodeList.item(j);
                if (node.nodeName == "name") {
                   var name = node.firstChild.nodeValue;
                }
                if (node.nodeName == "price") {
                   var price = node.firstChild.nodeValue; 
                }
                if (node.nodeName == "calories") {
                   var calories = node.firstChild.nodeValue; 
                }
                // If the user chose to display descriptions and
// the child node is "#cdata-section", grab the
// contents of the description CDATA for display. if (node.nodeName == "description" && description==true) { if (node.firstChild.nodeName == "#cdata-section") var data = node.firstChild.nodeValue; } } // Append extracted data to the HTML string. html += "<i><b>"; html += name; html += "</b></i><br>"; html += "&emsp;"; html += price; html += " - "; // If "calories" is greater than the user-specified calorie limit, // display it in red. if(calories > calorieLimit) { html += "<font color=#ff0000>"; html += calories + " calories"; html += " </font>"; } else html += calories + " calories"; html += "<br>"; // If user has chosen to display descriptions if (description==true) { html += "<i>" + data + "</i><br>"; } } // Close up div html += "</div>"; // Display HTML string in <div> _gel('content_div').innerHTML = html; }); } _IG_RegisterOnloadHandler(displayMenu); </script> ]]> </Content> </Module>

This code sample illustrates four of the primary functions you use to interact with DOM data:

  • getElementsByTagName(tagname)-- (_gelstn() is a wrapper around getElementsByTagName()). For a DOM document, returns an array of the element nodes whose names match tagname. You can retrieve all of the element nodes in a file by using the wildcard character (*), for example: response.getElementsByTagName("*").
  • getElementById(id)-- (_gel() is a wrapper around getElementById()). For a DOM document, retrieves a single node by id.
  • getAttribute(attrib)-- For an element node, returns the attribute attrib. For example: response.getElementsByTagName("menu").item(0).getAttribute("title").
  • attributes -- For an element node, returns an array of the node’s attributes.

This example only shows a few of the different functions for navigating a DOM tree. Some of the others you might try include lastChild, nextSibling, previousSibling, and parentNode.

Working With Different Node Types

The key to working effectively with DOM is appreciating the sometimes subtle differences between different node types.

Node Type Description Return Values Gotchas
element The structural building blocks of a document, such as <p> , <b> , or <calories>. nodeName: Whatever text is contained inside the angle brackets. For example, the nodeName of <menu> is “menu”.

nodeType: 1

nodeValue : null
An element has a nodeValue of null. To get to the value of a text or attribute node associated with an element, you must go to those nodes. For example: element.firstChild.nodeValue for text, and element.getAttribute(attrib) for attributes.
text Text. A text node is always contained within an element. It is a child of the element. nodeName: #text

nodeType: 3

nodeValue: Whatever text is contained in the node.
Some browsers render all whitespace in a document as text nodes, so that you get “empty” text nodes in your DOM object. This can cause unexpected results when you’re walking the tree. The solution may be as simple as filtering out text nodes that contain only the newline character, or you may want to do more robust handling. For more discussion of this topic, see Whitespace in the DOM.
attribute A key-value pair that provides additional information about an element node (for example, title=”my document”). An attribute is contained by an element node, but it is not a child of the element node. nodeName: The lefthand value in the attribute pair. If the attribute is title=”my document”, the nodeName is title.

nodeType: 2

nodeValue: The righthand value in the attribute pair (in this example, “my document”).
Even though attributes are nodes and are contained within element nodes, they are not child nodes of the element. They inherit from the Node interface, but the DOM doesn't consider them part of the DOM tree. This means that while you can use many of the node functions on attribute nodes (such as nodeName, nodeValue, and nodeType), you cannot access attribute nodes using the DOM tree-walking functions. To access attributes, you use the functions attributes and getAttribute(attrib).
CDATA A section in which content is left alone, not interpreted. CDATA sections are used to escape blocks of text containing characters that would otherwise be regarded as markup. The only delimiter that is recognized in a CDATA section is the "]]>" string that ends the CDATA section. nodeName: #cdata-section

nodeType: 4

nodeValue: Text and markup inside the CDATA delimiters.

The text in the CDATA section has its own markup. This could have implications for how you incorporate it into your gadget.

Other Resources

Working With Feeds

You can add a feed to your iGoogle page by typing its URL into the Add by URL form in the content directory. This uses the gadgets API built-in feed support to create a gadget for the feed and add it to iGoogle. It's easy to use, but it doesn't let you perform any customization to the content or display. Also, you can't use it with other Google properties.

For more sophisticated feed handling, the gadgets API provides the _IG_FetchFeedAsJSON(url, func, num_entries, get_summaries) function. _IG_FetchFeedAsJSON() fetches an RSS or Atom feed and returns the core feed data as a JSON object. JSON (JavaScript Object Notation) is a simple way of describing data as JavaScript.

_IG_FetchFeedAsJSON() takes the following parameters:

Name Data Type Description
url string, required The RSS or Atom feed URL to retrieve.
callback function, required The callback function to execute when the data is retrieved.
num_entries integer, optional The number of feed entries to retrieve from the feed. The accepted range is 1 through 100. The default is 3.
get_summaries boolean, optional Whether to retrieve the full text summaries for the entries in the feed. This defaults to false. You should only set this to true if you plan to use the data. The full summaries can be quite large and shouldn't be transferred needlessly.

Here are the fields in the JSON feed object:

Field Description
ErrorMsg If defined, describes any error that occurred.
URL The URL of the RSS / Atom feed.
Title The title of the feed.
Description The tagline or description of the feed.
Link Typically, the URL of the feed homepage.
Author The author of the feed.
Entry Array of feed entries. The following fields are nested within the Entry:
  • Title. The title of this feed entry.
  • Link. The URL of this feed entry.
  • Summary. The content or summary of this feed entry.
  • Date. Timestamp for this entry in seconds since Jan 1, 1970. To convert it to the milliseconds needed to initialize a JavaScript Date object with the correct date, multiply by 1000. See the sample gadget code below for an example.

The following example illustrates how to use the _IG_FetchFeedAsJSON() function to fetch a feed and display portions of its data in a gadget. Here is the running gadget. It lets users specify:

  • The number of entries to show
  • Whether the gadget should display dates and summaries for each entry
_IG_FetchFeedAsJSON Gadget

The live gadget with preferences set:

This is the code for the example:

<?xml version="1.0" encoding="UTF-8" ?> 

<Module>
  <ModulePrefs  
    title="_IG_FetchFeedAsJSON Example" 
    title_url="http://groups.google.com/group/Google-Gadgets-API" /> 
  <UserPref name="show_date" display_name="Show Dates?" datatype="bool"/>

  <UserPref name="show_summ" display_name="Show Summaries?" datatype="bool"/>
  <UserPref name="num_entries" display_name="Number of Entries:" />

  <Content type="html">
  <![CDATA[ 
    <style> #content_div { font-size: 80%;  margin: 5px; background-color: #FFFFBF;} </style>
 
    <div id=content_div></div>
     <script type="text/javascript">

     // Get userprefs
     var prefs = new _IG_Prefs();
     var showdate = prefs.getBool("show_date");
     var summary = prefs.getBool("show_summ");
     var entries = prefs.getInt("num_entries");

     // If user wants to display more than 100 entries, display an error
     // and set the value to 100, the max allowed.
     if (entries > 100)
     {
         alert("You cannot display more than 100 entries.");
         entries = 100;
     }

     // Use the _IG_FetchFeedAsJSON() function to retrieve core feed data from
     // the specified URL. Then combine the data with HTML markup for display in
     // the gadget.
     _IG_FetchFeedAsJSON("http://groups.google.com/group/Google-Gadgets-API/feed/rss_v2_0_msgs.xml",
              function(feed) { 
              if (feed == null){ 
                 alert("There is no data.");
                 return;
              }
     
         // Start building HTML string that will be displayed in gadget.
         var html = "";

         // Access the fields in the feed
         html += "<div><b>" + feed.Title + "</b></div>";
         html += "<div>" + feed.Description + "</div><br>";
     
         // Access the data for a given entry
         if (feed.Entry) {
             for (var i = 0; i < feed.Entry.length; i++) {
                 html += "<div>"

                 + "<a target='_blank' href='" + feed.Entry[i].Link + "'>"
                 + feed.Entry[i].Title
                 + "</a> ";
                 if (showdate==true)
                 { 
                     // The feed entry Date field contains the timestamp in seconds
                     // since Jan. 1, 1970. To convert it to the milliseconds needed
                     // to initialize the JavaScript Date object with the correct date, 
                     // multiply by 1000.
                     var milliseconds = (feed.Entry[i].Date) * 1000; 
                     var date = new Date(milliseconds); 
                     html += date.toLocaleDateString();
                     html += " ";
                     html += date.toLocaleTimeString(); 
                 } 
                 if (summary==true) { 
                     html += "<br><i>" + feed.Entry[i].Summary + "</i>";
                 }
                 html += "</div>";
             }
         }
     _gel("content_div").innerHTML = html;

     // The rest of the function parameters, which are optional: the number
     // of entries to return, and whether to return summaries.
     }, entries, summary);
 
  </script>

  ]]> 
  </Content>
</Module>

Refreshing the Cache

If you are using _IG_FetchContent(), _IG_FetchXmlContent(), or the _IG_Get... functions to fetch content that is updated more than once an hour, such as feed data, you might not get the latest updates. This is because the server caches results to make your gadget run faster. If you want to be sure that your gadget has the latest data, you can use the refreshInterval parameter to bypass the cache and force a refresh to happen within the interval you specify. In other words, the cache is refreshed every X seconds, where X = refreshInterval.

This feature is not implemented for the _IG_FetchFeedAsJSON() function.

To make sure your gadget fetches fresh new content at least once per interval, simply specify a value (measured in seconds) for the refreshInterval parameter. For example:

// Fetch fresh content every half hour
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: (60 * 30) });

// Fetch fresh content every 10 minutes
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: (60 * 10) });

// Fetch fresh content every 30 seconds
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: 30 });

// Disable caching completely and fetch fresh content every time --  !! Try to avoid using this !!
_IG_FetchContent("http://news.google.com/?output=rss", callback, { refreshInterval: 0 });

function callback(response) { ... } 

Caching serves a useful purpose, and you should be careful not to refresh the cache so often that you degrade performance. Caching makes fetching data faster. It also reduces the load on third-party servers hosting the remote content. You should try to avoid disabling the cache completely (which you would do by using refreshInterval: 0). If your gadget is getting millions of page views a day, sending out millions of requests to these servers, turning off caching could not only adversely affect your gadget's performance, but it could overload the servers that provide your gadget with data.

Since content is refreshed by default every hour, it only makes sense to specify an interval less than an hour. The recommended range is for refreshInterval is more than 60, and less than 3600.

Back to top