Export to GitHub

abot - issue #51

Add config value for MaxPagesToCrawlPerDomain


Posted on Dec 5, 2012 by Helpful Dog

Consider adding a config value for MaxPagesToCrawlPerDomain.

Comment #1

Posted on Dec 5, 2012 by Helpful Dog

1) Create a config value MaxPagesToCrawlPerDomain? in the CrawlConfiguration?.cs file and have .net fill it with the config section (like the other properties in that class) 2) Extend CrawlDecisionMaker?.cs 3) Add a ConcurrentDictionary? that keeps track of the domains that have been crawled and the current count for each domain 4) Override ShouldCrawlPage? method and have it addto/check the dictionary to be sure a domain is not crawled more than x times. 3) Pass in your implementation

WebCrawler crawler = new WebCrawler( null, null, null, null, null, new YourCrawlDecisionMaker(), null);

Comment #2

Posted on Dec 5, 2012 by Helpful Dog

Be sure to update the forum at https://groups.google.com/forum/#!topic/abot-web-crawler/HFu0DUGN9eU

Comment #3

Posted on Dec 10, 2012 by Helpful Dog

(No comment was entered for this change.)

Status: fixed

Labels:
Type-Feature Priority-Medium Milestone-Release1.1