|
|
Introduction
This page keeps the most up-to-date release notes.
Tesseract release notes Aug 30, 2007 - V2.01.
(See also release notes for 2.00 below for usage information)
No major functionality change. Just a bunch of bug fixes.
- Fixed UTF8 input problems with box file reader.
- Fixed various infinite loops and crashes in dawg code.
- Removed include of config_auto.h from host.h.
- Added automatic wctype encoding to unicharset_extractor.
- Fixed dawg table too full error.
- Removed svn files from tarball.
- Added new functions to tessdll.
- Increased maximum utf8 string in a classification result to 8.
- Added new functionality to TessBaseAPI for Ocropus.
No new data files for the original 6 languages. Use the files from v2.00. There are new data files for German Fraktur (deu-f) and Brazillian Portuguese (por).
STOP PRESS There is a minor bug in unicharset_extractor. Since this is only applicable to training, the main tarball is fine unless you need to run training, in which case, overwrite your unicharset_extractor.cpp and unicharset_extractor.exe with the ones in tesseract-2.01.patch1.tar.gz.
Tesseract release notes Jul 18, 2007 - V2.00.
(See also release notes for 1.04 below for additional usage information)
First release of the International version. This version recognizes the following languages:
- English - eng
- French - fra
- Italian - ita
- German - deu
- Spanish - spa
- Dutch - nld
tesseract inputimage outputbase -l langcode
To train on a new language, see TrainingTesseract. More languages will be appearing over time.
List of changes in this release:
- Converted internal character handling to UTF8.
- Trained with 6 languages.
- Added unicharset_extractor, wordlist2dawg.
- Added boxfile creation mode.
- Added UNLV regression test capability.
- Fixed problems with copyright and registered symbols.
- Fixed extern "C" declarations problem.
- Made some improvements to consistency of accuracy across platforms.
- Added vc++ express support.
Instructions for downloading and building version 2.00.
Things have changed quite a bit since the previous versions so please read carefully.
All users
The tarballs are split into pieces.
tesseract-2.00.tar.gz contains all the source code.
tesseract-2.00.<lang>.tar.gz contains the data files for <lang>. You need at least one of these or tesseract will not work.
Note that tesseract-2.00.tar.gz unpacks to the tesseract-2.00 directory. tesseract-2.00.<lang>.tar.gz unpacks to the tessdata directory which belongs inside your tesseract-2.00 directory. It is therefore best to download them into your tesseract-2.00 directory, so you can use unpack here or equivalent. You can unpack as many of the language packs as you care to, as they all contain different files.
Non-windows users
As with 1.04, this version works with make install.
New there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for the help.) It might work with your OS if you know how to do that.
If you are linking to the libraries, as Ocropus does, there is now a single master library called libtesseract_full.a.
Libtiff support should now be properly working via configure.
Windows users
tesseract-2.00.exe.tar.gz is not for the 'exe' language. It is windows executables. They are built with VC++ express and come with absolutely no warranty. If they work for you then great, otherwise you probably don't have the necessary dlls to go with them. To solve this you can get Visual C++ Express (and the platform sdk) from Microsoft and build from the source. Alternatively, non-techies might prefer to try tesseract-2.00.exe6.tar.gz which was built with Visual C++6. Most Windows machines will have all the necessary dlls for these exes to work, but note that the executables built with the newer compiler are smaller, faster, and, believe it or not, more accurate. (See TestingTesseract.)
If you are building from the sources, there are still .dsw and .dsp files for vc++6 and also .sln and .vcproj files for VC++ Express.
The dll has been updated to allow input of non-binary images. (Thanks to Glen of Jetsoft.)
Libtiff support can be added in either VC++6 or VC++Express with the following:
Goto http://gnuwin32.sourceforge.net/packages/tiff.htm Download and run the setup program. Add the paths for include and library files in tools/options/directories Add HAVE_LIBTIFF to the preprocessor definitions. Add libtiff.lib to the list of libraries. Rebuild. Make libtiff3.dll be in your path somewhere. This is done by control panel/system/advanced/environment variables and adding c:/program files/gnuwin32/bin to PATH. Keep your fingers crossed...
xx.00 Version Warning
Tesseract 2.00 has undergone more compatibility testing than any previous version. There have even been fixes to make the accuracy more consistent across platforms. Having said that, there have been many changes to the code, and portability may have been broken, so 64 bit and Mac platforms may not work or even build as well as before.
Tesseract release notes May 15, 2007 - V1.04.
Windows users only
Added a dll interface for windows. Thanks to Glen at Jetsoft for contributing this. To use the dll, include tessdll.h, import tessdll.lib and put tessdll.dll somewhere where the system can find it. There is also a small dlltest program to test the dll. Run with:
dlltest phototest.tif phototest.txt
It will output the text from phototest.tif with bounding box information.
New for Windows
The distribution now includes tesseract.exe and tessdll.dll which might work out of the box! There are no guarantees as you need VC++6 versions of mfc and crt (at least) for it to work. (Batteries not included, and certainly no installshield.)
Important note for anyone building with make: i.e. anyone except devstudio users
This release includes new standardization for the data directory. To enable Tesseract to find its data files, you must either:
./configure make make install
to move the data files to the standard place, or:
export TESSDATA_PREFIX="directory in which your tessdata resides/"
(or equivalent) in your .profile or whatever or setenv to set the environment variable. Note that the directory must end in a /
HAVING tesseract and tessdata IN THE SAME DIRECTORY DOES NOT WORK ANY MORE.
All users
Fixed a bunch of name collisions - mostly with stl. Made some preliminary changes for unicode compatibility. Includes a new data file (unicharset) and renaming of the other data files to eng. to support different languages. There are also several other minor bug fixes and portability improvements for 64 bit, the latest visual studio compiler etc. Thanks to all who have contributed these fixes.
NOTE: This is likely to be the last English-only release! Apologies in advance to non-windows users for bloating the distribution with windows executables. This will probably get fixed in the next release with the multi-language capability, since that will also bloat the distribution.
Sign in to add a comment

Shouldn't the visual c++ express version be usable with just the vcredist_x86.exe redistributable rather than requiring users to install vc++ express and the platform sdk?