My favorites | Sign in
Project Home Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
CrawlerTemplate  
A basic crawler template
Phase-Implementation
Updated Oct 6, 2014 by Robby@Zeitfuchs.org

All crawler-plugins are and should be placed inside the folder at /crawler/.

A basic crawler plugin could look like:

 import logging
 from core.abstracts import Crawler
 

try: from yapsy.IPlugin import IPlugin except ImportError: raise ImportError, 'Yapsy is required'

log = logging.getLogger("TestCrawler")

class Test(IPlugin, Crawler): def run(self): log.info("Fetching from localhost") self.mapURL = {} urls = self.options.get('url').split(',') for url in urls: log.info('URL=' + url) self.storeURL(url) return self.mapURL

Powered by Google Project Hosting