My favorites | Sign in
Google
                
Search
for
Updated Jun 30 (5 days ago) by theraysmith
ReleaseNotes  
Release Notes.

Introduction

This page keeps the most up-to-date release notes.

Tesseract release notes June 30 2009 - V2.04

Tesseract release notes April 22 2008 - V2.03

2.02 was unrunnable, due to a last-minute "simple" change. 2.03 fixes the problem and also adds an include check for leptonica to make it more usable.

Tesseract release notes April 21 2008 - V2.02

Tesseract release notes Aug 30, 2007 - V2.01.

(See also release notes for 2.00 below for usage information)

No major functionality change. Just a bunch of bug fixes.

No new data files for the original 6 languages. Use the files from v2.00. There are new data files for German Fraktur (deu-f) and Brazillian Portuguese (por).

STOP PRESS There is a minor bug in unicharset_extractor. Since this is only applicable to training, the main tarball is fine unless you need to run training, in which case, overwrite your unicharset_extractor.cpp and unicharset_extractor.exe with the ones in tesseract-2.01.patch1.tar.gz.

Tesseract release notes Jul 18, 2007 - V2.00.

(See also release notes for 1.04 below for additional usage information)

First release of the International version. This version recognizes the following languages:

The language codes follow ISO 639-2. The default language is English. To recognize another language:

tesseract inputimage outputbase -l langcode

To train on a new language, see TrainingTesseract. More languages will be appearing over time.

List of changes in this release:

  • Converted internal character handling to UTF8.
  • Trained with 6 languages.
  • Added unicharset_extractor, wordlist2dawg.
  • Added boxfile creation mode.
  • Added UNLV regression test capability.
  • Fixed problems with copyright and registered symbols.
  • Fixed extern "C" declarations problem.
  • Made some improvements to consistency of accuracy across platforms.
  • Added vc++ express support.

xx.00 Version Warning

Tesseract 2.00 has undergone more compatibility testing than any previous version. There have even been fixes to make the accuracy more consistent across platforms. Having said that, there have been many changes to the code, and portability may have been broken, so 64 bit and Mac platforms may not work or even build as well as before.

Tesseract release notes May 15, 2007 - V1.04.

Windows users only

Added a dll interface for windows. Thanks to Glen at Jetsoft for contributing this. To use the dll, include tessdll.h, import tessdll.lib and put tessdll.dll somewhere where the system can find it. There is also a small dlltest program to test the dll. Run with:

dlltest phototest.tif phototest.txt

It will output the text from phototest.tif with bounding box information.

New for Windows

The distribution now includes tesseract.exe and tessdll.dll which might work out of the box! There are no guarantees as you need VC++6 versions of mfc and crt (at least) for it to work. (Batteries not included, and certainly no installshield.)

Important note for anyone building with make: i.e. anyone except devstudio users

This release includes new standardization for the data directory. To enable Tesseract to find its data files, you must either:

./configure
make
make install

to move the data files to the standard place, or:

export TESSDATA_PREFIX="directory in which your tessdata resides/"

(or equivalent) in your .profile or whatever or setenv to set the environment variable. Note that the directory must end in a /

HAVING tesseract and tessdata IN THE SAME DIRECTORY DOES NOT WORK ANY MORE.

All users

Fixed a bunch of name collisions - mostly with stl. Made some preliminary changes for unicode compatibility. Includes a new data file (unicharset) and renaming of the other data files to eng. to support different languages. There are also several other minor bug fixes and portability improvements for 64 bit, the latest visual studio compiler etc. Thanks to all who have contributed these fixes.

NOTE: This is likely to be the last English-only release! Apologies in advance to non-windows users for bloating the distribution with windows executables. This will probably get fixed in the next release with the multi-language capability, since that will also bloat the distribution.


Comment by jens...@iname.com, Aug 06, 2007

Shouldn't the visual c++ express version be usable with just the vcredist_x86.exe redistributable rather than requiring users to install vc++ express and the platform sdk?

Comment by b...@liddicott.com, Oct 08, 2008

OK well I had some comments about how to improve the DLL api, but it looks like that just can't be done, since Tesseract code supports only one engine intrinsicly!

Comment by b...@liddicott.com, Oct 08, 2008

I'm working now on a DLL API callable from VB or .Net.

Comment by lbaggini, Jan 19, 2009

The .exe in Windows doesn't find a file: Unable to load unicharset file D:/OCR/tesseract-2.03/tessdata/eng.unicharset

Thank you

Comment by elsenwong80, Feb 27, 2009

Why chinese language is not available?


Sign in to add a comment