My favorites | Sign in
Project Home Issues Source
Project Information
Members
Featured
Downloads
Wiki pages

The MIT Language Modeling (MITLM) toolkit is a set of tools designed for the efficient estimation of statistical n-gram language models involving iterative parameter estimation. It achieves much of its efficiency through the use of a compact vector representation of n-grams. Details of the data structure and associated algorithms can be found in the following paper.

Currently, MITLM supports the following features:

  • Smoothing: Modified Kneser-Ney, Kneser-Ney, maximum likelihood
  • Interpolation: Linear interpolation, count merging, generalized linear interpolation
  • Evaluation: Perplexity
  • File formats: ARPA, binary, gzip, bz2

MITLM is available for download under the MIT License. It has been built and tested on 32-bit and 64-bit Intel CPUs running Debian Linux 4.0. It currently requires the following:

Acknowledgments

The design and implementation of this toolkit benefited significantly from the SRI Language Modeling Toolkit by Andreas Stolcke. The project is supported in part by the T-Party Project, a joint research program between MIT CSAIL and Quanta Computer Inc.

©2009 Bo-June (Paul) Hsu, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology.

Powered by Google Project Hosting