Export to GitHub

gungho-crawler - Index.wiki


Introduction

Gungho is an Extensible, High-Performance Web Crawler written in Perl. You can go gungho downloading web content with this framework.

History and Design

Gungho is partially based on Xango. It was designed for a particular purpose, and while it was nice, it wasn't too flexible, nor was it fun to debug. With these lessons learned, Gungho aims to be an easy-to-build crawler with the power of a first-notch web crawler.

For really simple uses, all you need is a config file:

```


debug: 1 provider: module: Simple config: url: - "http://search.cpan.org" - "http://www.perl.org" ```

and then just invoke the gungho script that comes with the distribution:

gungho -c config.yml

Voila! You just fetched some pages with Gungho. The beauty of this is that you can load any number of more complex components by simply writing them down in your config. (more examples coming later)

If you are interested in participating, please send emails to daisuke-at-endeworks.jp! Commit bids are given fairly easily.

Downloads

You can always get the latest svn snapshot via

http://gungho-crawler.googlecode.com/svn/trunk/Gungho

Or, if you prefer, check out a CPAN near you.

http://search.cpan.org/search?query=Gungho