My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages

Introduction

Phonetisaurus is a WFST-driven grapheme-to-phoneme (g2p) framework suitable for rapid development of high quality g2p or p2g systems. At present it includes a fast, EM-driven, WFST-based multiple-to-multiple alignment program, model conversion tools, a fast WFST-based decoder, and a Lattice Minimum Bayes-Risk decoder implementing a novel length-normalized loss function for computing N-gram factors. A specialized test distribution implementing N-best rescoring with Recurrent Neural Network Language Models via RNNLM is also included.

The project embodies a straight-forward ensemble approach to the g2p problem, and adopts a modular architecture that reflects the alignment, model-training and decoding steps that are common to most g2p approaches in the related literature. The project produces high-quality g2p and p2g results that are competitive with the state-of-the-art in this area. In addition to a fast C++ decoder which can handle word lists, isolated words, and n-best results, the project also includes training scripts.

See the ReadMe for a bunch of examples. See the FAQ page for solutions to common problems and issues. There is also a series of slides describing the LMBR decoder: phonetisaurus-lmbr-g2p.pdf. There are also several deprecated tutorials that discuss other aspects of the system. See the wiki list for details.

WFST representation of the awesome input word "radical", when using the multiple-to-multiple alignment algorithm.

Coming Soon

Integrated LM training using lattice-based partial counts and fractional Kneser-Ney smoothing.

LMBR decoding for multiple alignment.

Full integration and automation for RNNLM-based N-best rescoring.

Language-specific phonotactic template constraints.

Dependencies

Phonetisaurus depends on several other excellent projects, which are listed below.

OpenFst: All low-level FST manipulation is handled with the OpenFst library.

Language Model training toolkit: You may use your favorite toolkit to train an ARPA-format LM. My personal favorite is mitlm, but NGramLibrary, SRILM, CMU-Cambridge SLM or anything else that is capable of outputting a text-based, ARPA-format LM should also work just fine.

Acknowledgments

Work on this project was partially funded by the National Institute of Information and Communications Technology (NICT), Japan.

Powered by Google Project Hosting