My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Links

This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. Its data structures are smaller than SRILM, a commonly used language model library, and just as efficient.

News

June 24, 2011: version 1.0b2 has been released, with bug fixes from Kenneth Heafield, and some performance improvements.

August 14, 2011: version 1.0b3 has been released. This version can handle ARPA LM files which contain missing suffixes and prefixes. Also, we have released pre-built binaries for the Google N-Gram corpora. These can be downloaded here.

January 20, 2012: version 1.0.0 has been released. Fixes a bug in estimation of Kneser-Ney probabilities starting with the <s> tag. Also, several performance improvements, particularly in estimating Kneser-Ney probabilities. Note that binary compatibility was broken, so you will need to re-download all Google n-gram binaries.

April 9, 2012: version 1.0.1 has been released. Fixes an occasional crash in estimation of Kneser-Ney models.

Powered by Google Project Hosting