The Google Documents List Data API allows client applications to view and update documents (spreadsheets and word processor) using a Google Data API feed. Your client application can request a list of a user's documents, query the content of a user's documents, and upload new documents.
In addition to providing some background on the capabilities of the Documents List Data API, this guide provides examples for interacting with the API using the Python client library. If you're interested in understanding more about the underlying protocol used by the Python client library to interact with the Documents List, please see the protocol tab.
This document is intended for developers who want to write client applications using the Google Data Python client library that can interact with Google Documents.
Google Documents uses Google Accounts for authentication, so if you have a Google account you are all set. Otherwise, you can create a new account.
To use the Python client library, you'll need Python 2.0+ and the modules listed on the DependencyModules wiki page. After downloading the client library, you'll find the sample explained in this guide in the samples/docs subdirectory of the distribution. For a general introduction to the Python Client Library, you can view the Python Getting Started article.
A full working copy of this sample is available in the Google Data Python Client Library project in the project hosting section of code.google.com. The sample is located at /trunk/samples/docs/docs_example.py in the SVN repository accessible from the Source tab.
Run the example as follows:
python docs_example.py
The program will prompt you for a username and password. These values are the same credentials that you use to login to Google Documents.
The sample allows the user to perform a number of operations which demonstrate how to use the Documents List feed.
To include the examples in this guide into your own code, you'll need
the following import statements:
import gdata.docs.service import gdata.docs
The DocsService class represents a client connection (with authentication) to the Documents service. The general procedure for sending a query to a service using the client library consists of the following steps:
DocsService instance, setting your application's name (in the form companyName-applicationName-versionID).MediaSource.The Python client library can be used to work with either public or private feeds. The Documents List Data API provides access to private feeds only which require authentication with the documents servers. This can be done via ClientLogin username/password authentication or AuthSub proxy authentication. At this time, Google Documents only offers a private feed for Documents List.
Please see the authentication documentation for more information on AuthSub and ClientLogin.
To use ClientLogin (also called "Authentication for Installed
Applications"), invoke the ProgrammaticLogin
method of DocsService inherited from
Service, specifying the ID and password of the user on whose behalf your client is sending the query. For example:
gd_client = gdata.docs.service.DocsService() gd_client.email = 'example@gmail.com' gd_client.password = 'mypassword' gd_client.source = 'exampleCo-exampleApp-1' gd_client.ProgrammaticLogin()
For more information about authentication systems, see the Google Account Authentication documentation.
AuthSub proxy authentication is used by web applications which need to authenticate their users to Google accounts. i The operator does not need access to the username and password for the Documents user - only special AuthSub tokens are required.
When the user first visits your application, they have not yet been authenticated. In this case, you need to print some text and and a link directing the user to Google to authenticate your request for access to their documents. The Python Google Data client library provides a function to generate this URL. The code below sets up a link to the AuthSubRequest page.
def GetAuthSubUrl(): next = 'http://www.example.com/welcome.pyc' scope = 'http://docs.google.com/feeds/documents' secure = False session = True gd_client = gdata.docs.service.DocsService() return gd_client.GenerateAuthSubURL(next, scope, secure, session); authSubUrl = GetAuthSubUrl(); print '<a href="%s">Login to your Google account</a>' % authSubUrl
Notice the parameters sent to the GenerateAuthSubURL method:
The URL looks something like this:
https://www.google.com/accounts/AuthSubRequest?scope=http%3A%2F%2Fdocs.google.com%2Ffeeds%2Fdocuments&session=1&secure=0&next=http%3A%2F%2Fwww.example.com%2Fwelcome.pyc
The user can then follow the link to Google's site and authenticate to their Google account.
After the user authenticates, they will be redirected back to the next URL. The URL will have a single-use token value appended to it as a query parameter. The URL looks something like this:
http://www.example.com/welcome.pyc?token=14a87fe98219731acd516
For security, this token is single-use only, so now you need to exchange this single-use token for a session token. This process is described in the AuthSub documentation. The following code snippet shows how to upgrade the token.
gd_client = gdata.docs.service.DocsService() gd_client.auth_token = authsub_token gd_client.UpgradeToSessionToken()
In this snippet, the authsub_token variable contains the value from the token query parameter in the URL. There are several ways to retrieve this value, for example:
import cgi parameters = cgi.FieldStorage() authsub_token = parameters['token']
This token value represents a single-use AuthSub token. Since session = True was specified above, this token can be exchanged for an AuthSub session token using the UpgradeToSessionToken method, which calls the AuthSubSessionToken service.
You can get a feed containing a list of the currently authenticated user's documents by sending an authenticated GET request to the following URL:
http://docs.google.com/feeds/documents/private/full
The result is a "meta-feed," a feed that lists all of that user's documents; each entry in the feed represents a document (spreadsheet or word processor document) associated with the user. This feed is accessible only using an authentication token.
You can print out a list of the user's documents with the following two functions:
def ListAllDocuments(self):
"""Retrieves a list of all of a user's documents and displays them."""
feed = gd_client.GetDocumentListFeed()
PrintFeed(feed)
def PrintFeed(self, feed):
"""Prints out the contents of a feed to the console."""
print '\n'
if(len(feed.entry) == 0):
print 'No entries in feed.\n'
for i, entry in enumerate(feed.entry):
print '%s %s\n' % (i+1, entry.title.text.encode('UTF-8'))
The resulting DocumentListFeed
object feed represents a response from the server. Among other things, this feed
contains a list of DocumentListEntry
objects (feed.entry), each of which represents a single
document. DocumentListEntry encapsulates the information shown in the protocol document.
Any document uploaded to the server is first wrapped in a MediaSource object. In the examples the MediaSource constructor is taking in two variables: file_path is the name of the file including the file system path, and content_type is the MIME type (e.g. text/plain) of the document being uploaded. For more information on the MediaSource class, please use the Python built-in documentation system:
import gdata help(gdata.MediaSource)
For your convenience, there is a static dictionary member of the gdata.docs.service module named SUPPORTED_FILETYPES. It maps upper-case file extensions to their appropriate MIME types.
This example uploads a document, assuming file_path is the path to a word processor document of MIME type content_type. The entry variable is a DocumentListEntry object containining information about the document that was uploaded, including a direct link to the document.
ms = gdata.MediaSource(file_path = file_path, content_type = content_type) entry = gd_client.UploadDocument(ms,title) print 'Document now accessible at:', entry.GetAlternateLink().href
This example uploads a spreadsheet, assuming file_path is the path to a spreadsheet of MIME type content_type. The entry variable is a DocumentListEntry object containing information about the spreadsheet that was uploaded, including a direct link to the spreadsheet.
ms = gdata.MediaSource(file_path = file_path, content_type = content_type) entry = self.gd_client.UploadSpreadsheet(ms,title) print 'Spreadsheet now accessible at:', entry.GetAlternateLink().href
This example uploads a presentation, assuming file_path is
the path to a presentation of MIME type content_type. The entry variable is a DocumentListEntry object containing information about the presentation that was uploaded, including a direct link to the presentation.
ms = gdata.MediaSource(file_path = file_path, content_type = content_type) entry = self.gd_client.UploadPresentation(ms, title) print 'Presentation now accessible at:', entry.GetAlternateLink().href
To put a document in the trash, use the Delete method of the service object on the edit link of the Atom entry representing the document. To trash the new document from the upload examples above, you would execute the following.
self.gd_client.Delete(entry.GetEditLink().href)
You can search the Document List using some of the standard Google data API query parameters. Categories are used to restrict the type of document (word processor document, spreadsheet) returned. The full-text query string is used to search the content of all the documents. More detailed information on parameters specific to the Documents List can be found in the Documents List Data API Reference Guide.
In the Python client library, a DocumentQuery object can be used to construct queries for the Documents List feed. The following code is used in all of the examples below to print out the feed results to the command line.
def PrintFeed(self, feed):
"""Prints out the contents of a feed to the console."""
print '\n'
if(len(feed.entry) == 0):
print 'No entries in feed.\n'
for i, entry in enumerate(feed.entry):
print '%s %s\n' % (i+1, entry.title.text.encode('UTF-8'))
A list of only word processor documents can be retrieved by using the document category as follows:
q = gdata.docs.service.DocumentQuery(categories=['document']) feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
A list of only spreadsheets can be retrieved by using the spreadsheet category as follows:
q = gdata.docs.service.DocumentQuery(categories=['spreadsheet']) feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
A list of only starred presentations can be retrieved by using the presentation and starred categories as follows:
q = gdata.docs.service.DocumentQuery()
q.categories.append('presentation')
q.categories.append('starred')
feed = gd_client.Query(q.ToUri())
PrintFeed(feed)
Note that this can work for other document types as well, as the 'starred' category can be added to any DocumentQuery object.
It is possible to retrieve documents by matching on their title instead of their entire contents. To do this, add the title parameter to the DocumentQuery object. To match a title exactly, add a title-exact parameter to indicate this is the full, explicit title (including capitalization) of the document you want returned. Of course, there could be multiple documents with the same name, so a feed is returned.
q = gdata.docs.service.DocumentQuery() q['title'] = 'Test' q['title-exact'] = 'true' feed = self.gd_client.Query(q.ToUri()) PrintFeed(feed)
This sample prints out only documents that have exactly the title "Test".
In most cases, a category query which includes the folder name will find the documents in that folder. However, you can also explicitly request documents in a named folder by using a schema qualified query. The AddNamedFolder function lets you retrieve all documents in a specified folder belonging to a user with the specified email address:
q = gdata.docs.service.DocumentQuery() q.AddNamedFolder(email, folder_name)
This style of query is useful when a folder name conflicts with a category that has a different meaning, such as "starred". For example, to query for all the documents in the "starred" folder belonging to user "jo@gmail.com", you could use the function as follows:
q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder('jo@gmail.com', 'starred')
feed = self.gd_client.Query(q.ToUri())
PrintFeed(feed)
The important distinction here is if you had simply appended the category of "starred" you would get back a list of all starred documents, not the documents in the folder named "starred".
You can search the contents of documents by using the text_query property of the DocumentQuery object.
q = gdata.docs.service.DocumentQuery() q.text_query = 'test' feed = gd_client.Query(q.ToUri()) PrintFeed(feed)
This searches the entire contents of every document for the string "test" and returns all documents where this string is found. This is different than searching just the title of every document, which can be done as described in the section Retrieving a document by an exact title match.