transducersaurus

Tools for generating WFST-based ASR cascades.

RAAAAR!

NEWS

I'm currently in the process of re-factoring (re-writing?) most of the parser core so as to make it more flexible. The new goal is to provide a generic FST-based DSL that can be used to combine and optimize an arbitrary collection of components. The ASR WFST classes will of course remain, but these will become a default supported subset, rather than the be-all, end-all of the distribution. The idea is to then make it a straightforward thing to add new FST generation classes, or for the user to apply combination and optimization algorithms to an existing mixture of text-format, generated, and binary format WFSTs. The beginning of this expanded scope can be seen in,

http://code.google.com/p/transducersaurus/source/browse/python/NewParser.py

which has been completely decoupled from the default ASR component transducers. The set of available OpenFst operations is also being greatly expanded, but remember - with greater power comes greater responsibility! That is, it will probably be easier to crash your machine with the new version.

If you have a question you're just dieing to ask me feel free to write me directly.

Introduction

Transducersaurus provides a suite of tools suitable for generating the component transducers typically employed in the construction of Automatic Speech Recognition (ASR) cascades using Weighted Finite-State Transducers (WFSTs). In particular it provides classes suitable for generating the following component transducers,

Silence class (T) transducers from word lists.
Grammar (G) transducers from ARPA format language models, and simple regular-expression based expert grammars.
Lexicon (L) transducers from pronunciation dictionaries.
Context-Dependency (C) transducers from lists of monophones and auxiliary symbols.
HMM (H) transducers from an input Sphinx format mdef file or an hmm.hmm file**

Most of the transducer classes should be suitable for use in an arbitrary WFST decoder, but testing has focused on the T³ and Juicer WFST decoders.

Transducersaurus also includes a build program which supports a simple cascade generation grammar. You can specify your build chain command as a simple script, for example,

'min(det(C*det(L*(G*T))))'

and the program will work out the necessary series of OpenFST commands automagically. The build program supports both HTK and Sphinx format acoustic models and the Sphinx format cascades will run in TCubed out of the box. The HTK format cascades will run in both Juicer and TCubed.

The build tool grammar currently supports the following OpenFst operations,

fstdeterminize
fstminimize
fstpush
fstrmepsilon
fstcompose (standard and static lookahead composition)

and the build tool options can be used to modify the behavior of the above operations, for example changing the semiring used for individual operations or encoding weights or labels for minimization.

The python scripts only require python >= 2.5. However in order to build the cascades, OpenFST is required. OpenFst >= 1.2.6 is further required to run the static on-the-fly build scripts.

These python programs are intended primarily for study/learning purposes, but they should be suitable for building arbitrarily large and complicated models (if a bit slow). I've used the toolkit to successfully build a 3.0GB+ (C*det(L)).(G.T) cascade with a 64k vocabulary and a fairly large 3-gram LM: ngram 1=64000; ngram 2=13628086; ngram 3=8811112. Using static lookahead composition throughout routine compilation required about 20 min, and the CL.GT composition maxed out at roughly 15GB.

See the QuickStartGuide wiki for a couple recipes to get started with the toolkit and the Juicer decoder. CascadeTutorial wiki for a more detailed explanation of the underlying process. Finally, see the CascadeCompilationGrammar wiki for details on the grammar supported by the transducersaurus.py compilation tool.

Feel free to use this software for anything you like, but please let me know if you do.

Finally, this is a part of a slowly growing suite of related tools, which are listed up on my school website:

http://www.gavo.t.u-tokyo.ac.jp/~novakj/software.html

**Such as that generated by http://code.google.com/p/sphinx-am2wfst/

GRRRR!

Project Information

License: New BSD License
19 stars
hg-based source control

Labels:
wfst fst asr speechrecognition lvcsr fsm