getgrfeed

Fetch feeds stored in the Google Reader database

Every feed that has been subscribed to in Google Reader is stored in its database, even if the feed itself no longer exists. This simple script will print the entire stored contents of a given feed.

Since Google will be retiring its Reader product on July 1, 2013, I thought there might be a few people who would find this useful.

Google Reader has now been discontinued. This script will no longer work. However, you may still find http://getgrfeed.googlecode.com/git/atom2html.py'>atom2html.py useful (see below).

Getting started

This script requires Python to be installed on your system, and rudimentary familiarity with the command line.

Assuming Python has been properly installed, all you need is to download the script (right-click, save link as) to your working directory.

Configure the script by entering your Google email address and password in the "Personalize This" block.

Usage

At the command line in the directory containing the script, run it as follows: python getgrfeed.py feed-URL

where feed-URL is the feed you wish to dump. If there are no errors the feed will be printed to standard output. If you wish to save it to a file add a redirect to the end of the command (e.g. python getgrfeed.py http://www.w3.org/WAI/highlights/rssfeed.rss > output.xml).

If you're having trouble finding a specific feed, in Google Reader hover over the subscription you're interested in to view the link, and copy the part that follows /reader/view/feed/. E.g., http://www.google.com/reader/view/feed/http%3A%2F%2Fwww.w3.org%2FWAI%2Fhighlights%2Frssfeed.rss

Be aware that a very active, long-running feed may take a long time to download

Converting to HTML

If you wish to transform the XML file created by getgrfeed.py into a more readable format, one option is to use this: atom2html.py. This is a "quick and dirty" script with little error checking, but it should do the trick. It requires Python 2.7 or better.

There are three ways to run the script. 1. To read from standard input and write to standard output, run it with no arguments. 1. To read from a file and write to standard output, run: python atom2html.py input.xml 1. To read from a file and write to HTML broken into files of manageable size, run: python atom2html.py input.xml basename

Here basename is the base or root name of your output files. The script will append part numbers and the file extension (e.g., basename.html, basename2.html, basename3.html, ...). Modify the variable chunk_size to change the number of feed entries per file.

Related projects # Archive Team is coordinating a distributed effort to download as much Google Reader data as possible. Check it out if you're interested in participating, or to submit a list of feed URLs for them to crawl.

Project Information

The project was created on Jun 5, 2013.

License: Apache License 2.0
2 stars
git-based source control