My favorites | Sign in
Project Home Downloads Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 54: Webharvester seems to download/crawl duplicate pages
1 person starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  blake.ol...@gmail.com
Closed:  Mar 2010


 
Project Member Reported by blake.ol...@gmail.com, Mar 2, 2010
Webharvester seems to download/crawl duplicate pages.  It happens if the
prefix of the same URL differs slightly, like "www." vs. "http://wwww. vs.
no prefix at all.

Mar 2, 2010
Project Member #1 blake.ol...@gmail.com
Improved URL comparison function.
Status: Fixed

Powered by Google Project Hosting