| Projects on Google Code | Results 1 - 7 of 7 |
analyse heritrix data
A search engine.
Scope:Tsinghua University
使用lucene2.4.0处理餐馆信息的查询,达到精准,速度快等垂直搜索的特点。
==What is HBase-Writer?==
HBase-Writer is an extension to the Heritrix open source crawler written by the Internet Archive (http://crawler.archive.org/) that enables it to store crawled content directly into HBase tables (http://hbase.org/) running on the Hadoop Distributed FileSystem (http://hadoo...
The main goal of WARC Tools is to facilitate and promote the adoption of the [http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml WARC file format] for storing web archives by the mainstream web development community by providing an open source software library, a set of command line tool...
use heritrix as a crwaler
Khojo is technology demostrative project, initiated as weekend project. Most of time we end up getting irrelevant result after googling. Suppose we want to search papers thermodynamics
as a topic of physics, but we end up getting lots of un-rqeuired things and lost in sea of information. End resu...