SFST Finite State Tools
Introduction
SFST is a toolkit for the implementation of morphological analysers and other tools which are based on finite state transducer technology.
Details
The SFST tools comprise
- a programming language for finite state transducers (FST)
- a compiler which translates the FST programs into minimised transducers
- interactive and batch-mode programs for analysis and generation
- tools for comparing and printing transducers
- an efficient C++ transducer library for the implementation of new FST tools
SFST is
- freely available under the GNU General Public License
- easy to learn for users who are familiar with grep, sed, or Perl.
- efficient implementation in C++
- supports
- a wide range of transducer operations
- UTF-8 character coding
- weighted transducers (basic functionality only)
Downloads
Source code of the SFST tools * SFST version 1.4.6i
Precompiled morphological transducers * SMOR, a German finite-state morphology which is based on SFST. * EMOR, an English finite-state morphology using SFST.
Documentation * short manual (included in the source code package) * tutorial on the implementation of computational morphologies (included in the source code package)
Packages (not necessarily up to date) * Debian package for SFST (created by Francis Tyers)
Publications
Please cite the following publication if you want to refer to the SFST tools:
Helmut Schmid, A Programming Language for Finite State Transducers, Proceedings of the 5th International Workshop on Finite State Methods in Natural Language Processing (FSMNLP 2005), Helsinki, Finland.
Relationship to other FST Toolkits
There are two projects which extend the functionality of SFST in various ways:
Anssi Yli-Jyrä's AFST toolkit is based on SFST
The HFST tookit developed by Krister Lindén, Kimmo Koskenniemi, and colleagues was implemented on top of the three alternative FST libraries SFST, OpenFST, and foma.
Contributions by other authors
- Alex Linke provided an interface to the Graphviz tool for the graphical output of transducers.
- Sebastian Nagel wrote an Emacs mode for editing transducer files and a Perl program which converts SFST transducers to the Graphviz format (similar to that of Alex Linke).
- Stefan Evert also sent me a Graphviz converter.
- Matthias Kistler provided a highlighting mode for the VIM editor.
- Toni Arnold developed a Python interface for the SFST library and Emores, an Empirical MOrphological REaSoning engine for the automatic acquisition of lemmas from a word list.
- Marius L. Jøhndal created a Ruby interface for the SFST library.
- UIMA wrapper for SFST (developed at the UKP Lab)
Please send comments, suggestions and bug reports to Helmut Schmid at FirstName.LastName@ims.uni-stuttgart.de. (Insert the name into the email address.)