Toolkit for statistical learning of vocabularies.
Models implemented:
- N-grams ** Combined n-grams with Good-Turing smoothing
- Probabilistic suffix tree ** Suffix-based variable order Markov model
- Variable-length Hidden Markov Model ** New HMM-based approach, built with all words substrings
- Dynamic Markov Model ** Prefix-based variable order Markov model