|
Project Information
Links
|
This project provides a library for estimating storing large n-gram language models in memory and accessing them efficiently. Its data structures are smaller than SRILM, a commonly used language model library, and just as efficient. NewsJune 24, 2011: version 1.0b2 has been released, with bug fixes from Kenneth Heafield, and some performance improvements. August 14, 2011: version 1.0b3 has been released. This version can handle ARPA LM files which contain missing suffixes and prefixes. Also, we have released pre-built binaries for the Google N-Gram corpora. These can be downloaded here. January 20, 2012: version 1.0.0 has been released. Fixes a bug in estimation of Kneser-Ney probabilities starting with the <s> tag. Also, several performance improvements, particularly in estimating Kneser-Ney probabilities. Note that binary compatibility was broken, so you will need to re-download all Google n-gram binaries. April 9, 2012: version 1.0.1 has been released. Fixes an occasional crash in estimation of Kneser-Ney models. |