crawler-commons


Shared Java components for web crawlers

Overview

crawler-commons is a set of reusable Java components that implement functionality common to any web crawler. These components benefit from collaboration among various existing web crawler projects, and reduce duplication of effort.

Crawler-Commons News

22nd April 2015 - crawler-commons has moved

The crawler-commons project is now being hosted at GitHub, due to the demise of Google code hosting. Please go to https://github.com/crawler-commons/crawler-commons for the latest news, issues, documentation and code.

Project Information