Export to GitHub

tesseract-polish - issue #1

Need training data made in 600 DPI


Posted on Jan 18, 2011 by Happy Panda

As can be seen in SVN (http://tesseract-polish.googlecode.com/svn/trunk/src/training_images/boxtiff.pol/), currently all training data is based only on 300 DPI scans.

The cause for this is simple - when I've been working on training, I only had a 300 DPI scanner at my disposal.

This results in somewhat suboptimal recognition of texts printed/scanned in 600 DPI or more.

Now I have access to a 600 and 1200 DPI scanner, but have no time whatsoever to perform additional training using that.

If anyone would step up to do this work, it would be very appreciated.

Status: Accepted

Labels:
Type-Defect Priority-Medium