Consider adding a config value for MaxPagesToCrawlPerDomain.
Comment #1
Posted on Dec 5, 2012 by Helpful Dog1) Create a config value MaxPagesToCrawlPerDomain? in the CrawlConfiguration?.cs file and have .net fill it with the config section (like the other properties in that class) 2) Extend CrawlDecisionMaker?.cs 3) Add a ConcurrentDictionary? that keeps track of the domains that have been crawled and the current count for each domain 4) Override ShouldCrawlPage? method and have it addto/check the dictionary to be sure a domain is not crawled more than x times. 3) Pass in your implementation
WebCrawler crawler = new WebCrawler( null, null, null, null, null, new YourCrawlDecisionMaker(), null);
Comment #2
Posted on Dec 5, 2012 by Helpful DogBe sure to update the forum at https://groups.google.com/forum/#!topic/abot-web-crawler/HFu0DUGN9eU
Comment #3
Posted on Dec 10, 2012 by Helpful Dog(No comment was entered for this change.)
Status: fixed
Labels:
Type-Feature
Priority-Medium
Milestone-Release1.1