My favorites | Sign in
Logo
                
Search
for
Updated Jul 20, 2009 by tmb...@gmail.com
CxxProgramming  
How to extend OCRopus in C++

Getting Started

Often, the easiest way of customizing OCRopus is not by writing new C++ tools, but by reconfiguring it via parameter settings or choosing different components, by shell scripting, or by scripting in Python or Lua. You need to write C++ code if you want to implement compute-intensive new processing steps, or if you really need a self-contained executable.

Using OCRopus Components

Let's start by writing a small program that illustrates the major aspects of OCRopus C++ programming.

// ocrobin.cc -- binarize an input file
// usage: ocrobin input.png output.png

// use the standard include files for colib, iulib, and ocropus
// (colib ships as part of iulib)

#include <colib/colib.h>
#include <iulib/iulib.h>
#include <ocropus/ocropus.h>

using namespace colib;
using namespace iulib;
using namespace ocropus;

// get a parameter from the environment, with a default value

param_string method("method","BinarizeByOtsu","binarization method");

int main(int argc,char **argv) {
    try {
        if(argc!=3) throw "wrong # arguments";

        // register all the internal OCRopus components so that make_component works
        init_ocropus_components();

        // instantiate the binarization component
        autodel<IBinarize> binarizer;
        make_component(method,binarizer);

        // read an input image
        bytearray image;
        read_image_gray(image,argv[1]);

        // apply the binarizer
        bytearray output;
        binarizer->binarize(output,image);

        // write the result
        write_image_gray(argv[2],output);
    } catch(const char *message) {
        fprintf(stderr,"error: %s\n",message);
        exit(1);
    }
}

Put this into a file called ocrobin.cc, then compile it with the command:

g++ ocrobin.cc -locropus -liulib -llept -lpng -ljpeg -lgif -ltiff -fopenmp

The ocropus and iulib libraries are libraries of the OCRopus project. The lept library is the Leptonica image processing library. The png, jpeg, gif, and tiff libraries are used for image I/O. The -fopenmp flag tells the compiler to compile in, and link with, multicore support.

Note the following points:

Defining New OCRopus Components

Here is an example of code that defines a (not very good) image binarization component and makes it available to the rest of OCRopus (this example is in extras/sample-extension):

#include <colib/colib.h>
#include <iulib/iulib.h>
#include <ocropus/ocropus.h>

using namespace colib;
using namespace iulib;
using namespace ocropus;

namespace ocropus { int main_ocropus(int,char **); }

struct MyThresholder : IBinarize {
    const char *name() { return "mythresholder"; }
    const char *description() { return "performs thresholding based on the mean"; }
    MyThresholder() {
        pdef("factor",1.0,"threshold is factor * mean");
    }
    void binarize(bytearray &out,floatarray &in) {
        float factor = pgetf("factor");
        float mean = sum(in)/in.length();
        debugf("info","threshold=%g\n",mean);
        int n = in.length();
        out.makelike(in);
        for(int i=0;i<n;i++)
            out[i] = 255 * (in[i]>=factor*mean);
    }
};

extern "C" {
    void ocropus_init_dl();
}

void ocropus_init_dl() {
    component_register<MyThresholder>("MyThresholder");
}

int main(int argc,char **argv) {
    component_register<MyThresholder>("MyThresholder");
    main_ocropus(argc,argv);
}

Compile it with:

g++ ocrothresh.cc -locropus -liulib -ljpeg -lpng -lgif -ltiff -fopenmp -llept -lSDL -lSDL_gfx -lgsl -lblas

Afterwards, you can use the MyThresholder component anywhere in OCRopus. For example, you can access it from the command line and list its parameters:

$ ./a.out params MyThresholder
param default mythresholder_factor=2 1 threshold is factor * mean

name=MyThresholder
description=performs thresholding based on the mean
$ binarizer=MyThresholder ./a.out threshold test.jpg out.png

You can also use the component dynamically from within OCRopus (if your build supports dynamic loading):

$ g++ -fPIC -shared -g -o ocrothresh.so ocrothresh.cc -locropus -liulib -ljpeg -lpng -lgif -ltiff -fopenmp -llept -lSDL -lSDL_gfx -lgsl -lblas
$ extension=./ocrothresh.so binarizer=MyThresholder ocropus threshold test.jpg out.png
[info] using mythresholder
[info] threshold=234.447
$ 

Coding Conventions

Please have a look at the Conventions http://docs.google.com/Doc?id=dfxcv4vc_508vv9g6khd; all contributions should follow these.

The most important parts of the conventions are:

General programming principles are:

In terms of formatting, please observe:

If you see significant violations of these coding conventions that don't come with a justification, please submit an issue report.

FIXME describe scripts in utilities/ that check for violations of some of these.

The Array Data Type

The most important compound data type in OCRopus is an array class that can represent rank 1-4 arrays, as well as stacks and lists.

The constructor looks like this:

narray<T>();
narray<T>(int d0);
narray<T>(int d0,int d1);
narray<T>(int d0,int d1,int d2);
narray<T>(int d0,int d1,int d2,int d3);

Rather than writing all these overloadings, let's just abbreviate this to narray<T>(int d0,...)

The copy constructor and assignment operators are intentionally disabled; you cannot pass an array by value, and you cannot return it from a function/method. That's because if you did so accidentally, it would have an unacceptable performance penalty. Instead of returning arrays, just follow the coding conventions. That is, instead of:

floatarray f(double x); // DO NOT DO THIS
floatarray a = f(x);

write

void f(floatarray &a,double x);
floatarray a;
f(a,x);

This is a little more tedious, but it avoids a whole range of memory management issues and makes the code easy to bind to other programming languages.

Memory management for arrays is handled by these methods:

void resize(int d0,...);
void renew(int d0,...);
void reshape(int d0,...);
void dealloc();

The difference between this is that resize may destroy all the data previously allocated by the array, renew guarantees that it will allocate and initialize new underlying storage, and that reshape will never allocate new storage and must retain the same total number of elements. Dealloc simply deallocates all storage associated with the array, returning it to the original state it was in right after being declared (that is, a.dealloc(); a.resize(10,10); a(0,0) = 99; is valid and common).

Accessing the properties and individual elements is handled using these methods:

int rank() const;
int dim(int i) const;
T &at(int i0,...);
T &operator()(int i0,...);

Even arrays of rank >1 can always be treated as arrays of rank 1 (with elements in C order):

int length1d() const;
T &at1d(int i) const;
T &operator[](int i) const;

1D arrays can also be treated as stacks (similar to Python lists); the accessors are the same. The following methods implement the remaining stack/list operations:

int length();
void push(T &value);
float &pop();
float &last();
void clear();
void reserve(int n);
void grow_to(int n);

The OCR Interfaces

The following are OCR interfaces that the rest of the system understands. If you write to these interfaces, chances are that your algorithm can be used as a drop-in replacement in the system:

Invoking the Line Recognizer

Here is a longer example showing how to invoke the line recognizer. Usage is a.out character.model image.png.

#include "colib/colib.h"
#include "iulib/iulib.h"
#include "ocropus/ocropus.h"
#include "ocropus/glinerec.h"

using namespace iulib;
using namespace colib;
using namespace ocropus;
using namespace narray_ops;
using namespace glinerec;

int main(int argc,char **argv) {
    init_ocropus_components();
    init_glclass();
    init_glfmaps();
    init_linerec();

    autodel<IRecognizeLine> linerec;
    make_component(linerec,"linerec");
    stdio model(argv[1],"r");
    linerec->load(model);

    bytearray image;
    read_image_gray(image,argv[2]);

    autodel<IGenericFst> result;
    make_component(result,"OcroFST");

    linerec->recognizeLine(*result,image);

    nustring str;
    str.clear();
    // should be using a language model here
    result->bestpath(str);

    narray<char> s;
    str.utf8Encode(s);
    s.push(0);

    printf("%s\n",&s[0]);
    return 0;
}

Sign in to add a comment
Hosted by Google Code