My favorites | Sign in
Project Home Source
Project Information
Members
Links

This script demonstrates a method for gradually indexing a site by incrementing a unique identifier over time. There are better ways to do this (e.g. sitemaps and fast scaling architectures) but this is not always possible (e.g. legacy systems).

The example given is USPTO's Trademark Electronic Search System (TESS) which is indexable, but not until someone links to a trademark. Usually this doesn't happen until it's too late (e.g. Dell's 'cloud computing' trademark which surfaced only after various articles linked to it) so this script seeks to expose this hidden information. Running on App Engine, it will reach its target by the end of 2009 and then slow to 20k serials per month (a bit more than the anticipated average growth rate).

It's unlikely to be useful in its current form but could serve as a skeleton for other applications. Simply copy the adapted script to a web accessible location and link to it.

Powered by Google Project Hosting