My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips
ListGrid
   
Loading...
  ID Type  Status  Priority  Milestone  Owner    Summary + Labels ...
  26 Defect New Medium ---- ----   How to force crawler4j to stay within initial domain  
  30 Defect New Medium ---- ----   How to get original links in html  
  48 Enhancement Accepted Medium ---- ----   Make cookie policy configurable  
  51 Defect New Medium ---- ----   Multiple domains crawl without politeness interval  
  55 Defect New Medium ---- ----   Getting information from Root Folder  
  58 Defect New Medium ---- ----   Crawler ignores Crawl-delay from the host's robots.txt  
  59 Defect New Medium ---- ----   Crawler ignores robots meta-tag from the page  
  60 Defect New Medium ---- ----   Where is manual? Please write some simply steps to do  
  61 Defect New Medium ---- ----   MakeCrawlerJ distributed  
  69 Enhancement New Medium ---- ----   Requests Per Second Per Host  
  74 Defect New Medium ---- ----   Accept-Language header  
  75 Enhancement New Medium ---- ----   Exchangable robots.txt stores  
  86 Defect New Medium ---- ----   Crawler not found window.location url  
  88 Defect New Medium ---- ----   How to crawl a pasword protected website. Can you provide some samples fro the same where authentication is involved  
  94 Enhancement Accepted Medium ---- ----   shouldVisit list of domain to crawl  
  95 Defect New Medium ---- ----   Resuming Enabled Large Seed List Takes Forever  
  105 Enhancement Accepted Medium ---- ----   Stat for Rel="nofollow" attribute in anchor (<a) tag.  
  109 Enhancement Accepted Medium ---- ----   Configuration to set what type of links to crawl - SCRIPT,LINK,IMG etc.,  
  110 Enhancement Accepted Medium ---- ----   File URLs Fetching  
  112 Defect New Medium ---- ----   visit method for each domain crawl  
  116 Defect Started Medium ---- ----   Cannot handle page with 207 status code  
  121 Defect New Medium ---- ----   fetcher.PageFetcher: Failed: HTTP/1.1 400 Bad Request  
  122 Defect New Medium ---- ----   Crawl Never Starts Final Cleanup  
  123 Other New Medium ---- ----   crawler storage data size is increasing  
  126 Defect New Medium ---- ----   crawling of sites within mailto:  
  127 Defect New Medium ---- ----   port of robots.txt  
  131 Defect Started Medium ---- ----   Internal error in WebURL  
  133 Enhancement Accepted Medium ---- ----   How to get the content type and prevent crawling for example feeds?  
  135 Defect New Medium ---- ----   How can i download the javascript files?  
  136 Defect New Medium ---- ----   JVM crash when running crawler on Centos 6.2  
  138 Defect New Medium ---- ----   URLCanonicalizer parameters normalization is buggy  
  139 Defect New Medium ---- ----   url contains '\'  
  140 Defect New Medium ---- ----   Different Domains for different threads  
  141 Defect New Medium ---- ----   Give developers the option of getting the urls on a page themselves  
  142 Defect New Medium ---- ----   The Crawler thread appends with Crawler.setMaxPages(int)  
  143 Defect New Medium ---- ----   Impossible to get anchor text in visit(Page page)  
  144 Defect New Medium ---- ----   Add a possiblity to use Factory for instantiating new WebCrawlers, instead of hardcoded usage of class.newInstance()  
  145 Defect New Medium ---- ----   charsetName NullPointer exception  
  146 Defect New Medium ---- ----   Html content comes incomplete  
  147 Defect New Medium ---- ----   Class not found exception  
  148 Defect New Medium ---- ----   Incompatible argument to function Exception  
  149 Defect New Medium ---- ----   Proper compression support in the PageFetcher  
  150 Defect New Medium ---- ----   Unexpected behavior of URLCanonicalizer.getCanonicalURL(href, context)  
  151 Defect New Medium ---- ----   Making a focused crawler based on the page content?  
  152 Defect New Medium ---- ----   The crawler stops running further if the start url returns a 302 redirect.  
  153 Defect New Medium ---- ----   How to crawl web pages like *.do?  
  154 Defect New Medium ---- ----   sleepycat "75 min" IllegalArgumentException  
  155 Defect New Medium ---- ----   Where is Crawled Data being stored after crawling ends  
CSV
  
Powered by Google Project Hosting