My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Index  
Gungho - Extensible, High Performance Web Crawler
Updated Feb 4, 2010 by lestr...@gmail.com

Introduction

Gungho is an Extensible, High-Performance Web Crawler written in Perl. You can go gungho downloading web content with this framework.

History and Design

Gungho is partially based on Xango. It was designed for a particular purpose, and while it was nice, it wasn't too flexible, nor was it fun to debug. With these lessons learned, Gungho aims to be an easy-to-build crawler with the power of a first-notch web crawler.

For really simple uses, all you need is a config file:

  ---
  debug: 1
  provider:
    module: Simple
    config:
      url:
        - "http://search.cpan.org"
        - "http://www.perl.org"

and then just invoke the gungho script that comes with the distribution:

   gungho -c config.yml

Voila! You just fetched some pages with Gungho. The beauty of this is that you can load any number of more complex components by simply writing them down in your config. (more examples coming later)

If you are interested in participating, please send emails to daisuke-at-endeworks.jp! Commit bids are given fairly easily.

Downloads

You can always get the latest svn snapshot via

http://gungho-crawler.googlecode.com/svn/trunk/Gungho

Or, if you prefer, check out a CPAN near you.

http://search.cpan.org/search?query=Gungho
Comment by mitchell...@gmail.com, Jul 14, 2007

Great! Another spider to block!

Comment by il...@o2.pl, Nov 19, 2010

Sign in to add a comment
Powered by Google Project Hosting