What's new? | Help | Directory | Sign in
Google
             
Search
for
Updated May 11, 2008 by tmbdev
GettingStartedWithBleedingEdge  
Getting Started with Subversion Version of OCRopus

Prerequisites

We recommend building on Ubuntu 7.10 or 8.04, since that's what we build and test on. Other recent versions of Linux will probably work.

People have built OCRopus on OS X, with VisualStudio, and on other platforms. We appreciate feedback for how to improve portability, build files, etc., and will try to make them available to others as much as possible. For official support of another platform, we would need volunteers who can perform nightly builds and submit patches as soon as anything breaks.

Tesseract (optional)

Tesseract is a fast and pretty accurate text recognition engine from HP and Google. It is optional, but we recommend that you include it for now if you want to perform OCR. If you mostly need OCRopus for other document analysis tasks, you need not include it.

To install it, check out the current subversion version of Tesseract from the Tesseract repository:

svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only

(If you have trouble with that version, download the latest tarball instead.)

Follow its instructions for building it and then install it in /usr/local (the default location you get with configure; make; make install).

OpenFST (optional)

OpenFST is used for building statistical language models. It is optional, and you may not need it.

Download the latest OpenFST distribution from http://openfst.org/

Follow its instructions for building the distribution. Note that OpenFST does not use a standard directory structure; you have to cd two levels down.

After everything has built, install the files in the right place:

mkdir -p /usr/local/include/fst/lib
cp -v fst/lib/*.h /usr/local/include/fst/lib
cp fst/lib/*.a /usr/local/lib

OCRopus

Check out the current subversion version of OCRopus from the OCRopus respository:

svn checkout http://ocropus.googlecode.com/svn/trunk/ ocropus

To build OCRopus, just run ./configure, then jam, then jam install.

Unit Tests

After you're done, you should run the unit tests:

Command Line

The OCRopus command line program is called ocroscript and it's installed in /usr/local/bin

It takes either scripts or subcommands on the command line. Subcommands are scripts that are installed in /usr/local/lib/ocropus/... or somewhere along the path defined by the OCROSCRIPTS environment variable. The ocroscripts/scripts directory contains the available top-level commands.

Here are some examples:

ocroscript rec-tess file.png

Sign in to add a comment