My favorites | Sign in
Project Home Downloads Source
Project Information
Members
Links

This site offers a set of Bash scripts and Windows executables add-ins that, together, create a basic translation chain prototype able of processing very large corpora. It uses Moses, a widely known statistical machine translation system.

The idea is to help build a translation chain for the real world, but it should also enable a quick evaluation of Moses for actual translation work and guide users in their first steps of using Moses.

A Help/Short Tutorial (http://moses-for-mere-mortals.googlecode.com/files/Help-Short-Tutorial.doc) and a demonstration corpus (too small for doing justice to the qualitative results that can be achieved with Moses, but able of giving a realistic view of the relative duration of the steps involved) are available.

Two Windows add-ins allow the creation of Moses input files from *.TMX translation memories (Extract_TMX_Corpus.exe), as well as the creation of *.TMX files from Moses output files (Moses2TMX.exe). A synergy between machine translation and translation memories is therefore created.

The scripts were tested in Ubuntu 10.04 LTS. Documents used for corpora training should be perfectly aligned and saved in UTF-8 character encoding. Documents to be translated should also be in UTF-8 format. One would expect the users of these scripts, perhaps after having tried the provided demonstration corpus, to immediately use and get results with the real corpora they are interested in.

Though already tested and used in actual work, this should be considered a work in progress.

The latest version of the program, though labelled "beta", is the more extensively tested and used.

Powered by Google Project Hosting