My favorites | Sign in
Project Logo
                
Code license: Apache License 2.0
Labels: java, lucene, nutch, asf, tika, apache
Feeds:

The Tika project is an attempt to build a generic content analysis toolkit based on ideas and code from Apache Nutch and other related projects. The goal is to create a reusable core of content analysis functionality, including features like:

  1. MimeType Repository
  2. Language Identifier
  3. Content Signature
  4. Generic Meta Data Infrastructure
  5. Charset Detector
  6. Parse Plugins Framework

The eventual goal of the project is to become an Apache Lucene subproject.









Hosted by Google Code