|
Project Information
Featured
Downloads
|
IntroductionPhonetisaurus is a WFST-driven grapheme-to-phoneme (g2p) framework suitable for rapid development of high quality g2p or p2g systems. At present it includes a fast, EM-driven, WFST-based multiple-to-multiple alignment program, model conversion tools, a fast WFST-based decoder, and a Lattice Minimum Bayes-Risk decoder implementing a novel length-normalized loss function for computing N-gram factors. A specialized test distribution implementing N-best rescoring with Recurrent Neural Network Language Models via RNNLM is also included. The project embodies a straight-forward ensemble approach to the g2p problem, and adopts a modular architecture that reflects the alignment, model-training and decoding steps that are common to most g2p approaches in the related literature. The project produces high-quality g2p and p2g results that are competitive with the state-of-the-art in this area. In addition to a fast C++ decoder which can handle word lists, isolated words, and n-best results, the project also includes training scripts. See the ReadMe for a bunch of examples. See the FAQ page for solutions to common problems and issues. There is also a series of slides describing the LMBR decoder: phonetisaurus-lmbr-g2p.pdf. There are also several deprecated tutorials that discuss other aspects of the system. See the wiki list for details.
Coming SoonIntegrated LM training using lattice-based partial counts and fractional Kneser-Ney smoothing. LMBR decoding for multiple alignment. Full integration and automation for RNNLM-based N-best rescoring. Language-specific phonotactic template constraints. DependenciesPhonetisaurus depends on several other excellent projects, which are listed below. OpenFst: All low-level FST manipulation is handled with the OpenFst library. Language Model training toolkit: You may use your favorite toolkit to train an ARPA-format LM. My personal favorite is mitlm, but NGramLibrary, SRILM, CMU-Cambridge SLM or anything else that is capable of outputting a text-based, ARPA-format LM should also work just fine. AcknowledgmentsWork on this project was partially funded by the National Institute of Information and Communications Technology (NICT), Japan. |