English | Site Directory

Google Documents List Data API

Developer's Guide: Python

The Google Documents List Data API allows client applications to view and update documents (spreadsheets and word processor) using a Google Data API feed. Your client application can request a list of a user's documents, query the content of a user's documents, and upload new documents.

In addition to providing some background on the capabilities of the Documents List Data API, this guide provides examples for interacting with the API using the Python client library. If you're interested in understanding more about the underlying protocol used by the Python client library to interact with the Documents List, please see the protocol tab.

Contents

  1. Audience
  2. Getting started
  3. Authenticating to the Documents service
    1. Single-user "installed" client authentication
    2. Multiple-user web application client authentication
    3. Upgrading to a session token
  4. Retrieving a list of documents
  5. Uploading documents
    1. Uploading a word processor document
    2. Uploading a spreadsheet
    3. Uploading a presentation
  6. Trashing a document
  7. Searching the documents feed
    1. Retrieving all word processor documents
    2. Retrieving all spreadsheets
    3. Retrieving all starred presentations
    4. Retrieving a document by an exact title match
    5. Retrieving all documents in a named folder
    6. Performing a text query

Audience

This document is intended for developers who want to write client applications using the Google Data Python client library that can interact with Google Documents.

Getting started

Google Documents uses Google Accounts for authentication, so if you have a Google account you are all set. Otherwise, you can create a new account.

To use the Python client library, you'll need Python 2.0+ and the modules listed on the DependencyModules wiki page. After downloading the client library, you'll find the sample explained in this guide in the samples/docs subdirectory of the distribution. For a general introduction to the Python Client Library, you can view the Python Getting Started article.

A full working copy of this sample is available in the Google Data Python Client Library project in the project hosting section of code.google.com. The sample is located at /trunk/samples/docs/docs_example.py in the SVN repository accessible from the Source tab.

Run the example as follows:

python docs_example.py

The program will prompt you for a username and password. These values are the same credentials that you use to login to Google Documents.

The sample allows the user to perform a number of operations which demonstrate how to use the Documents List feed.

To include the examples in this guide into your own code, you'll need the following import statements:

import gdata.docs.service
import gdata.docs

The DocsService class represents a client connection (with authentication) to the Documents service. The general procedure for sending a query to a service using the client library consists of the following steps:

  1. Create a new DocsService instance, setting your application's name (in the form companyName-applicationName-versionID).
  2. Set the appropriate credentials.
  3. Obtain or construct the appropriate URI.
  4. If you are uploading a document, create the appropriate object using the client library class MediaSource.
  5. Call a method to send the request and receive any results.

Authenticating to the Documents service

The Python client library can be used to work with either public or private feeds. The Documents List Data API provides access to private feeds only which require authentication with the documents servers. This can be done via ClientLogin username/password authentication or AuthSub proxy authentication. At this time, Google Documents only offers a private feed for Documents List.

Please see the authentication documentation for more information on AuthSub and ClientLogin.

Single-user "installed" client authentication

To use ClientLogin (also called "Authentication for Installed Applications"), invoke the ProgrammaticLogin method of DocsService inherited from Service, specifying the ID and password of the user on whose behalf your client is sending the query. For example:

gd_client = gdata.docs.service.DocsService()
gd_client.email = 'example@gmail.com'
gd_client.password = 'mypassword'
gd_client.source = 'exampleCo-exampleApp-1'
gd_client.ProgrammaticLogin()

For more information about authentication systems, see the Google Account Authentication documentation.

Multiple-user web application client authentication

AuthSub proxy authentication is used by web applications which need to authenticate their users to Google accounts. i The operator does not need access to the username and password for the Documents user - only special AuthSub tokens are required.

When the user first visits your application, they have not yet been authenticated. In this case, you need to print some text and and a link directing the user to Google to authenticate your request for access to their documents. The Python Google Data client library provides a function to generate this URL. The code below sets up a link to the AuthSubRequest page.

def GetAuthSubUrl():
  next = 'http://www.example.com/welcome.pyc'
  scope = 'http://docs.google.com/feeds/documents'
  secure = False
  session = True
  gd_client = gdata.docs.service.DocsService()
  return gd_client.GenerateAuthSubURL(next, scope, secure, session);

authSubUrl = GetAuthSubUrl();
print '<a href="%s">Login to your Google account</a>' % authSubUrl

Notice the parameters sent to the GenerateAuthSubURL method:

  • next, the URL of the page that Google should redirect the user to after authentication.
  • scope, indicating that the application is requesting access to the Documents feed.
  • secure, indicating that the token returned will not be a secure token.
  • session, indicating this token can be exchanged for a multi-use (session) token.

The URL looks something like this:

https://www.google.com/accounts/AuthSubRequest?scope=http%3A%2F%2Fdocs.google.com%2Ffeeds%2Fdocuments&session=1&secure=0&next=http%3A%2F%2Fwww.example.com%2Fwelcome.pyc

The user can then follow the link to Google's site and authenticate to their Google account.

After the user authenticates, they will be redirected back to the next URL. The URL will have a single-use token value appended to it as a query parameter. The URL looks something like this:

http://www.example.com/welcome.pyc?token=14a87fe98219731acd516

Upgrading to a session token

For security, this token is single-use only, so now you need to exchange this single-use token for a session token. This process is described in the AuthSub documentation. The following code snippet shows how to upgrade the token.

gd_client = gdata.docs.service.DocsService()
gd_client.auth_token = authsub_token
gd_client.UpgradeToSessionToken()

In this snippet, the authsub_token variable contains the value from the token query parameter in the URL. There are several ways to retrieve this value, for example:

import cgi
parameters = cgi.FieldStorage()
authsub_token = parameters['token']

This token value represents a single-use AuthSub token. Since session = True was specified above, this token can be exchanged for an AuthSub session token using the UpgradeToSessionToken method, which calls the AuthSubSessionToken service.

Retrieving a list of documents

You can get a feed containing a list of the currently authenticated user's documents by sending an authenticated GET request to the following URL:

http://docs.google.com/feeds/documents/private/full

The result is a "meta-feed," a feed that lists all of that user's documents; each entry in the feed represents a document (spreadsheet or word processor document) associated with the user. This feed is accessible only using an authentication token.

You can print out a list of the user's documents with the following two functions:

 def ListAllDocuments(self):
    """Retrieves a list of all of a user's documents and displays them."""

  feed = gd_client.GetDocumentListFeed()
  PrintFeed(feed)

def PrintFeed(self, feed):
  """Prints out the contents of a feed to the console."""

  print '\n'
  if(len(feed.entry) == 0):
    print 'No entries in feed.\n'
  for i, entry in enumerate(feed.entry):
    print '%s %s\n' % (i+1, entry.title.text.encode('UTF-8'))

The resulting DocumentListFeed object feed represents a response from the server. Among other things, this feed contains a list of DocumentListEntry objects (feed.entry), each of which represents a single document. DocumentListEntry encapsulates the information shown in the protocol document.

Uploading documents

Any document uploaded to the server is first wrapped in a MediaSource object. In the examples the MediaSource constructor is taking in two variables: file_path is the name of the file including the file system path, and content_type is the MIME type (e.g. text/plain) of the document being uploaded. For more information on the MediaSource class, please use the Python built-in documentation system:

import gdata
help(gdata.MediaSource)

For your convenience, there is a static dictionary member of the gdata.docs.service module named SUPPORTED_FILETYPES. It maps upper-case file extensions to their appropriate MIME types.

Uploading a word processor document

This example uploads a document, assuming file_path is the path to a word processor document of MIME type content_type. The entry variable is a DocumentListEntry object containining information about the document that was uploaded, including a direct link to the document.

  ms = gdata.MediaSource(file_path = file_path, content_type = content_type)
  entry = gd_client.UploadDocument(ms,title)
  print 'Document now accessible at:', entry.GetAlternateLink().href

Uploading a spreadsheet

This example uploads a spreadsheet, assuming file_path is the path to a spreadsheet of MIME type content_type. The entry variable is a DocumentListEntry object containing information about the spreadsheet that was uploaded, including a direct link to the spreadsheet.

  ms = gdata.MediaSource(file_path = file_path, content_type = content_type)
  entry = self.gd_client.UploadSpreadsheet(ms,title)
  print 'Spreadsheet now accessible at:', entry.GetAlternateLink().href

Uploading a presentation

This example uploads a presentation, assuming file_path is the path to a presentation of MIME type content_type. The entry variable is a DocumentListEntry object containing information about the presentation that was uploaded, including a direct link to the presentation.

  ms = gdata.MediaSource(file_path = file_path, content_type = content_type)
  entry = self.gd_client.UploadPresentation(ms, title)
  print 'Presentation now accessible at:', entry.GetAlternateLink().href

Trashing a document

To put a document in the trash, use the Delete method of the service object on the edit link of the Atom entry representing the document. To trash the new document from the upload examples above, you would execute the following.

self.gd_client.Delete(entry.GetEditLink().href)

Searching the documents feed

You can search the Document List using some of the standard Google data API query parameters. Categories are used to restrict the type of document (word processor document, spreadsheet) returned. The full-text query string is used to search the content of all the documents. More detailed information on parameters specific to the Documents List can be found in the Documents List Data API Reference Guide.

In the Python client library, a DocumentQuery object can be used to construct queries for the Documents List feed. The following code is used in all of the examples below to print out the feed results to the command line.

def PrintFeed(self, feed):
  """Prints out the contents of a feed to the console."""

  print '\n'
  if(len(feed.entry) == 0):
    print 'No entries in feed.\n'
  for i, entry in enumerate(feed.entry):
    print '%s %s\n' % (i+1, entry.title.text.encode('UTF-8'))

Retrieving all word processor documents

A list of only word processor documents can be retrieved by using the document category as follows:

  q = gdata.docs.service.DocumentQuery(categories=['document'])
  feed = gd_client.Query(q.ToUri())
  PrintFeed(feed)

Retrieving all spreadsheets

A list of only spreadsheets can be retrieved by using the spreadsheet category as follows:

  q = gdata.docs.service.DocumentQuery(categories=['spreadsheet'])
  feed = gd_client.Query(q.ToUri())
  PrintFeed(feed)

Retrieving all starred presentations

A list of only starred presentations can be retrieved by using the presentation and starred categories as follows:

  q = gdata.docs.service.DocumentQuery()
  q.categories.append('presentation')
  q.categories.append('starred')
  feed = gd_client.Query(q.ToUri())
  PrintFeed(feed)

Note that this can work for other document types as well, as the 'starred' category can be added to any DocumentQuery object.

Retrieving a document by an exact title match

It is possible to retrieve documents by matching on their title instead of their entire contents. To do this, add the title parameter to the DocumentQuery object. To match a title exactly, add a title-exact parameter to indicate this is the full, explicit title (including capitalization) of the document you want returned. Of course, there could be multiple documents with the same name, so a feed is returned.

  q = gdata.docs.service.DocumentQuery()
  q['title'] = 'Test'
  q['title-exact'] = 'true'
  feed = self.gd_client.Query(q.ToUri())
  PrintFeed(feed)

This sample prints out only documents that have exactly the title "Test".

Retrieving all documents in a named folder

In most cases, a category query which includes the folder name will find the documents in that folder. However, you can also explicitly request documents in a named folder by using a schema qualified query. The AddNamedFolder function lets you retrieve all documents in a specified folder belonging to a user with the specified email address:

  q = gdata.docs.service.DocumentQuery()
  q.AddNamedFolder(email, folder_name)

This style of query is useful when a folder name conflicts with a category that has a different meaning, such as "starred". For example, to query for all the documents in the "starred" folder belonging to user "jo@gmail.com", you could use the function as follows:

  q = gdata.docs.service.DocumentQuery()
  q.AddNamedFolder('jo@gmail.com', 'starred')
  feed = self.gd_client.Query(q.ToUri())
  PrintFeed(feed)

The important distinction here is if you had simply appended the category of "starred" you would get back a list of all starred documents, not the documents in the folder named "starred".

Performing a text query

You can search the contents of documents by using the text_query property of the DocumentQuery object.

  q = gdata.docs.service.DocumentQuery()
  q.text_query = 'test'
  feed = gd_client.Query(q.ToUri())
  PrintFeed(feed)

This searches the entire contents of every document for the string "test" and returns all documents where this string is found. This is different than searching just the title of every document, which can be done as described in the section Retrieving a document by an exact title match.

Back to top