My favorites | Sign in
Google
                
Search
for
Updated Sep 24, 2009 by gk5...@kickstyle.net
Labels: Featured
UsersGuide  

Sites Import/Export Tool User's Guide

Overview

The Sites Liberation import/export tool uses the Sites GData API to allow users to export an entire Google Site as static html pages to a directory on their hard drive. The html is embedded with meta-data based on the hAtom microformats specification, to allow the re-import of html back into Google Sites. Applications of the tool include backing up a Google Site, switching to or from a different service, and editing a Site offline.

Simple Execution

The tool has been packaged as an executable jar. If Java is installed, just double-click on it.

Advanced Execution

The tool is written in Java and the source code is currently hosted at code.google.com. Upon building the tool, there are three main classes. The class com.google.sites.liberation.export.Main allows the execution of a Sites export from the command line. The class com.google.sites.liberation.imprt.Main is the equivalent for a Sites import (note that the package name is "imprt," due to the Java keyword collision). The class com.google.sites.liberation.util.GuiMain provides a graphical user interface for launching both imports and exports. In all cases, the import/export takes the following arguments:

Name Flag Usage
Host -h If not sites.google.com, specifies the Site's host (optional). Used for debugging.
Domian -d If the site is a Google Apps site, specifies the domain, e.g. dataliberation.org (optional).
Webspace -w Specifies the webspace of the Site, e.g. "dataliberation" for a site located at http://sites.google.com/a/domain/dataliberation
Username -u Specifies the user name used to access the Site.
Password -p Specifies the password used to access the Site.
Directory -f Specifies the root directory to export to / import from.
Revisions -r If this flag is included, then the revisions of all of the pages in the Site will be exported/imported as well as the current page (optional).

Structure

The folder structure of an exported site is meant to mimic the Sites UI as closely as possible. Thus if exporting to a directory "rootdirectory," a top-level page normally located at webspace/pagename, would be in a file named index.html, located in rootdirectory/pagename. A subpage of that page, normally located at webspace/pagename/subpage, would be in a file named index.html in rootdirectory/pagename/subpage. Attachments are downloaded to the same directory as the index.html page to which they belong, and if revisions are exported, they will be located in a directory called "revisions" within the directory containing the index.html file. Each revision will be in its own file named number.html. Additionally, if revisions are exported, a file named "history.html" will be placed in the same directory as the index.html file, containing links to all of the revisions of the page. However, the history.html file is not used for import, and thus may be omitted even when importing revisions.

Format

The exported html uses meta-data to include semantic information necessary for import. When possible, the format follows the hAtom microformats specification. However, since the hAtom specification is meant to encode a subset of the Atom syndication format, and GData is a superset of the Atom format, there are a number of differences/additions. The following is a list of Sites API elements and their html encodings. For more information of the meaning of each element, see the Sites API documentation.

GData Element Microformat Class Details
entry hentry As in the hAtom spec, entries are encoded by specifying the class of an html element to be "hentry." However, since all entries in the Sites API have exactly one kind (encoded as a category), the class must also contain the label for the entry's kind (e.g. "hentry webpage"). Additionally, an entry's id is encoded as the value of the id attribute in the hentry element.
author author As in the hAtom spec, the author of an entry is specified with the class "author". The author html element must contain an hCard, specified by the class "vcard." However, since all entries in a Site contain exactly one author with an email address, name, and nothing more, the entry should contain only one element with class "author," and the vcard can be encoded as "name," since this is the natural representation.
content entry-content As in the hAtom spec, the content of an entry is specified with an html element with class "entry-content." In the case of xhtml content, everything within the content element is taken as the content of the entry. However, since attachments contain out-of-line content, if the entry-content element contains an href attribute, then that value is taken as out-of-line content, and the element's inner-html is not parsed as content.
summary entry-summary As in the hAtom spec, the summary of an entry is specified with the class "entry-summary," with the element's inner html taken as the value. The summary element is used for the description of an attachment in a file cabinet.
title entry-title As in the hAtom spec, the title of an entry is specified with an html element with class, "entry-title." Since the title of a Sites entry can only contain plaintext, the title parsed from an "entry-title" is any plaintext within the element.
updated updated As in the hAom spec, the updated time of entry is specified with class, "updated", and encoded using the datetime-design-pattern.
sites:revision sites:revision The revision number of an entry is encoded with the class, "sites:revision", where the plaintext within the inner html is parsed as an integer.
gs:data gs:data List pages in the GData feeds contain a gs:data element which contains gs:column elements encoding the list's column headers. Likewise the html list page entry must contain an element with class "gs:data" which itself contains encoded gs:column's.
gs:column gs:column The columns for a list page are encoded with class "gs:column" and must be embedded within an html element with class "gs:data." The gs:column index attribute is encoded as the title attribute in the html element, and the name attribute is encoded as the inner html.
gs:field gs:field List items in the GData feeds contain gs:field elements representing each of the list item's field's indices, names, and values. The field is encoded in html as an element with class, "gs:field", where the value is the element's inner html, and the index is the element's title attribute value. The name is not encoded in the html since it can be inferred from the index and the corresponding list page to which the list item belongs.

The parent link and pageName elements in the GData feeds are not embedded in the html, but are instead represented by the structure of the exported Site. Since each index.html file represents a page in a Site, exactly one entry with a page kind (announcementspage, announcements, filecabinet, listpage, webpage) should appear in the file. Any child entries of non-page kind (attachment, comment, listitem, webattachment) should appear in the same file and may be embedded within the page entry, but need not be. The parent link for subpages is represented by the folder structure as described in the earlier section. Finally, the pageName element is represented by the name of the directory in which the index.html file exists.

Known Issues/Limitations


Comment by laurent.duchateau, Sep 26, 2009

The JAR file seems to be corrupted.

I have the following error message :

java -jar google-sites-liberation-1.0.jar Exception in thread "main" java.lang.UnsupportedClassVersionError?: Bad version number in .class file

at java.lang.ClassLoader?.defineClass1(Native Method) at java.lang.ClassLoader?.defineClass(ClassLoader?.java:675) at java.security.SecureClassLoader?.defineClass(SecureClassLoader?.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) at java.net.URLClassLoader.access$100(URLClassLoader.java:56) at java.net.URLClassLoader$1.run(URLClassLoader.java:195) at java.security.AccessController?.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader?.loadClass(ClassLoader?.java:316) at sun.misc.Launcher$AppClassLoader?.loadClass(Launcher.java:280) at java.lang.ClassLoader?.loadClass(ClassLoader?.java:251) at java.lang.ClassLoader?.loadClassInternal(ClassLoader?.java:374)

Thank you,

Laurent

Comment by jsandoe, Sep 27, 2009

I see the same error. I am running Java jre 1.5.0_20 on a Mac. Perhaps the code was compiled for version 1.6?

Jonathan

Comment by zlouattara, Sep 27, 2009

Laurent, télécharges java1.6 Same goes for you jsandoe, yes I can confirm that, you can simply get 1.6.x and set it as your default...

I'm improving it, making a web version for Appengine for my clients... Thanks Ben, Terrific! - Zie Lassina Ouattara

Comment by stephen.hind, Sep 28, 2009

I have just made 1.6 my default Java and it works (I'm on a Mac too)

Comment by tjfinneran, Sep 28, 2009

Great tool! working on Ubuntu 9.04 derivative using sites as a hobbyist ;-) <W THANKYOU

Comment by jamezpol...@gmail.com, Sep 28, 2009

I see the same; I've filed a bug (http://code.google.com/p/google-sites-liberation/issues/detail?id=13) about it.

Comment by michael.david.koch, Oct 08, 2009

I tried exporting our site, but only a small amount of the content was downloaded. Is this a common problem? We do have a large amount of data in the site, so maybe that is the issue?

Comment by love.townsend, Oct 11, 2009

oooo i cant thankyou enough for this, i had been using a fairly hokey method to back-my-junk-up (tm) which is over 400 pages all told. this worked like a champ on my Google apps standard account with over 300 pages. but it would be nice if you could add...

a. an option to download only pages and not the files/attachments (and visa versa) b. an option to pipe the whole show to a compressed or tar file.

Love ya Baby - love.townsend

Comment by c...@camoma.com, Oct 13, 2009

I have a domain, camoma.com, and we use google apps with this domain. I have tried to get this java applet up and running, but I keep getting the following message:

--- Retrieving site data (this may take a few minutes). No data returned. You may have provided invalid Site information or credentials. ---

I use the following parameters: Host: sites.google.com/a/camoma.com Domain: Blank or camoma.com (Have tried both) Webspace: Testsite (Which is a site I created at our sites.google.com) User/pass: Same as the rest of our solution.

I am pretty sure the problem is not credentials. If I try with fake credentials, and when I do so, I immediately get an error.

Does anyone have any suggestions?

Comment by michelmallejac, Oct 18, 2009

Idem : After a few second, I got the messge "No data returned. You may have provided invalid Site information or credentials.", but the progress bar continues to move. Entering wrong credentials returns an error popup instead, then stops.

I've tried to make my site public for testing purpose : same result.

Thanks for this software, it looks promising !

Comment by jlueck, Oct 20, 2009

For those having issues with "invalid credentials" errors, please use the following as a guide on how to fill in the appropriate values for your Sites domain:

Host: sites.google.com (you should never have to modify this) Domain: (your domain, or the word "site" if you don't have a domain) Webspace: the name of the site

Comment by quotationoftheday, Oct 26, 2009

1) I got the "invalid credentials" errors until I ensured the Domain: field was empty.

2) I am able to export my web site to a local directory, but when I try to import it back, I see errors on my console:

Oct 26, 2009 8:39:01 AM com.google.sites.liberation.imprt.EntryUpdaterImpl? updateEntry WARNING: Unable to update entry:{WebPageEntry? {BasePageEntry? {BaseContentEntry? com.google.gdata.data.sites.WebPageEntry?@1368c5d}}} com.google.gdata.util.ServiceForbiddenException?: Forbidden If-Match or If-None-Match header required

at com.google.gdata.client.http.HttpGDataRequest.handleErrorResponse(HttpGDataRequest.java:561) at com.google.gdata.client.http.GoogleGDataRequest.handleErrorResponse(GoogleGDataRequest.java:543) at com.google.gdata.client.http.HttpGDataRequest.checkResponse(HttpGDataRequest.java:536) at com.google.gdata.client.http.HttpGDataRequest.execute(HttpGDataRequest.java:515) at com.google.gdata.client.http.GoogleGDataRequest.execute(GoogleGDataRequest.java:515) at com.google.gdata.client.Service.update(Service.java:1482) at com.google.gdata.client.Service.update(Service.java:1448) at com.google.gdata.client.GoogleService?.update(GoogleService?.java:583) at com.google.sites.liberation.imprt.EntryUpdaterImpl?.updateEntry(EntryUpdaterImpl?.java:47) at com.google.sites.liberation.imprt.EntryUploaderImpl?.uploadEntry(EntryUploaderImpl?.java:113) at com.google.sites.liberation.imprt.PageImporterImpl?.importPage(PageImporterImpl?.java:104) at com.google.sites.liberation.imprt.SiteImporterImpl?.importPage(SiteImporterImpl?.java:61) at com.google.sites.liberation.imprt.SiteImporterImpl?.importSite(SiteImporterImpl?.java:47) at com.google.sites.liberation.util.GuiMain?$ImportExportRunnable?.run(GuiMain?.java:284) at java.lang.Thread.run(Thread.java:595)

I have tried running the jar file with both Java 1.5 and 1.6, with the same error occurring in both.

Comment by zuromin.powiat, Oct 27, 2009

When I export my pages which have attachements this java program hangs up. I can turn it off but i have only pages without attachements and I want all my pages. I have PDF's in attachements. When this program download this files it stop working. So i have in my destination folder downloaded pages which are before page with an attachements.

Comment by Ephilei, Nov 05, 2009

I had issues on OS X - liberation got stuck on certain pages. I ran on Windows (same settings) and had no issues.

Comment by linch0...@gmail.com, Nov 09, 2009

I just test it and it works well.

I got the same error message at the beginning.

"No data returned. You may have provided invalid Site information or credentials"

After I read this forum, I found the answer and I want to share with you. My PC runs Windows Vista OS and I have installed java visual machine before. So, I use "Simple Execution".

In the user interface of this application, please follow this information:

  • Host: sites.google.com
  • Domain: site/your domain name
  • Webspace: the name of the site that you applied to google.

It works! I hope it works for you, too!

Thanks for people who wrote this application. This is a very useful tool. Thank you.

Ben

Comment by gustavo.flouret, Nov 14, 2009

First of all, the solution works. Having said that, I have a problem regarding accented letters. Mi site is written in spanish and all the accented letters (á,é, í, etc) in the backup files are wrong.

Any suggestions?

Thanks and regards.

Gustavo

Comment by simbeckhampson, Nov 18 (5 days ago)

Super app. Well done, it works very well on Vista. Thanks again. Paul


Sign in to add a comment