My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Wiki pages

The Tika project is an attempt to build a generic content analysis toolkit based on ideas and code from Apache Nutch and other related projects. The goal is to create a reusable core of content analysis functionality, including features like:

  1. MimeType Repository
  2. Language Identifier
  3. Content Signature
  4. Generic Meta Data Infrastructure
  5. Charset Detector
  6. Parse Plugins Framework

The eventual goal of the project is to become an Apache Lucene subproject.

Powered by Google Project Hosting