My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Links

ClearTK

ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA. It is developed by the Center for Computational Language and Education Research (CLEAR) at the University of Colorado at Boulder. Please see the conceptual overview for a broad introduction to ClearTK.

Features

  • A common interface and wrappers for popular machine learning libraries such as SVMlight, LIBSVM, OpenNLP MaxEnt, and Mallet.
  • A rich feature extraction library that can be used with any of the machine learning classifiers. Under the covers, ClearTK understands each of the native machine learning libraries and translates your features into a format appropriate to whatever model you're using.
  • Infrastructure for creating NLP components for specific tasks such as part-of-speech tagging, BIO-style chunking, named entity recognition, semantic role labeling, temporal relation tagging, etc.
  • Wrappers for common NLP tools such as the Snowball stemmer, the OpenNLP tools, the MaltParser dependency parser, and the Stanford CoreNLP tools.
  • Corpus readers for collections like the Penn Treebank, ACE 2005, CoNLL 2003, Genia, TimeBank and TempEval.

Getting Started

For the latest released version: See the user setup.

For the latest and greatest from the repository: See the developer setup.

See the tutorial for an example of how to build a simple statistical machine learning component with ClearTK, and the module listing for an overview of the modules contained in ClearTK.

License

Most of ClearTK is distributed under the BSD license which can be viewed here. However, there are a couple of sub-projects that are licensed under the GPL license because they depend on GPL licensed third party libraries. In order to comply with the GPL we have isolated the code that has GPL dependencies into sub-projects which are licensed with GPL and these projects are excluded from our main release on our downloads page. For additional details related to software license please see the following resources:

Questions?

If you have questions about ClearTK please post them to cleartk-users@googlegroups.com.

Cite ClearTK

If you use ClearTK to support academic research, then please cite the following paper as appropriate:

@inproceedings{ogren_cleartk:uima_2008,
  title = {
    {ClearTK}: A {UIMA} toolkit
    for statistical natural language processing},
  booktitle = {
    Towards Enhanced Interoperability for Large {HLT} Systems:
    {UIMA} for {NLP} workshop
    at Language Resources and Evaluation Conference ({LREC})},
  author = {
    Philip V. Ogren and
    Philipp G. Wetzler and
    Steven Bethard},
  year = {2008}
}
Powered by Google Project Hosting