My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
ReadMe  
Important information all Tesseract users need to know.
Featured
Updated Dec 9, 2011 by zde...@gmail.com

Introduction

This package contains the Tesseract Open Source OCR Engine. Orignally developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado, all the code in this distribution is now licensed under the Apache License:

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Other Dependencies

  • Leptonica is required and provides image I/O and processing.
  • Other image libraries are required by Leptonica for image I/O (png, tiff, jpg etc.)

Installing and Running Tesseract

Distribution packages

Tesseract is split into several packages:

  • tesseract-x.xx.tar.gz contains all the source code.
  • tesseract-2.xx.<lang>.tar.gz contains the Tesseract 2 language data files for <lang>. You need at least one of these or tesseract 2 will not work.
  • <lang>.traineddata.gz contains the Tesseract 3 language data file for <lang>. You need at least one of these or tesseract 3 will not work.

Note that tesseract-x.xx.tar.gz unpacks to the tesseract-x.xx directory. tesseract-x.xx.<lang>.tar.gz unpacks to the tessdata directory which belongs inside your tesseract-x.xx directory. It is therefore best to download them into your tesseract-x.xx directory, so you can use unpack here or equivalent. You can unpack as many of the language packs as you care to, as they all contain different files. If you unpack them as root to the destination directory of make install, then the user ids and access permissions might be messed up.

Similarly <lang>.traineddata.gz must be unpacked to tessdata directory of tesseract-x.xx instalation.

boxtiff-2.01.<lang>.tar.gz contains data that was used in training for those that want to do their own training. Most users should NOT download these files.

Instructions for using the training tools are documented separately at TrainingTesseract3 and for testing at TestingTesseract.

Installation Notes - Tesseract 3.01

General

IMPORTANT: 3.01 is not backwards compatible with 2.04. The data files are different. (Single file per language among other things.) You therefore need to make sure you connect your new executable with the new data files.

Another important change is that you should really be using TessBaseAPI if you are linking with another program. In Linux (non-Windows) the main library is now libtesseract_api.a instead of the old libtesseract_full.a.

The command line is:

tesseract <image> <outputbasename> [-l lang] [configs]

In the executable, page layout analysis is enabled by default. You may need to turn it off to process small images. No command-line control for this yet. Sorry. See tesseractmain.cpp.

The training process is described on separate wiki page.

Use the most recently available language files for the languages that you want.

Linux

If they are not already installed, you need the following libraries (Ubuntu):

sudo apt-get install autoconf automake libtool
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg62-dev
sudo apt-get install libtiff4-dev
sudo apt-get install zlib1g-dev

You also need to install Leptonica. There is an apt-get package libleptonica-dev, but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source. 3.01 requires at least v1.67 of Leptonica. The sources are at http://www.leptonica.org/. The instructions at Leptonica README are clear, but basically it is the usual

./autogen.sh
./configure
make
sudo make install
sudo ldconfig

Now back to Tesseract. Download the source from svn:

svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only

or package tesseract-3.01.tar.gz from download page. The same build process as usual applies:

./autogen.sh
./configure
make
sudo make install
sudo ldconfig

On some systems autotools did not create m4 directory automatically (you got error: "configure: error: cannot find macro directory 'm4'"). In this case you must create m4 dicrectory by yourself before running ./configure:

mkdir -p m4

Between configure and make, you can check that everything has worked by looking at config_auto.h It should contain #define HAVE_LIBLEPT 1.

You can also use:

export TESSDATA_PREFIX=/some/path/to/tessdata

to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export=TESSDATA_PREFIX='/usr/local/share/'). The command line for running tesseract is:

tesseract <image> <outputbasename> [-l lang] [configs]

Install language data:

  1. Download langugage data file (e.g. 'wget http://tesseract-ocr.googlecode.com/files/eng.traineddata.gz')
  2. Decompress it ('gzip -d eng.traineddata.gz')
  3. Move it to installation tessdata (e.g. 'mv eng.traineddata $TESSDATA_PREFIX' if defined TESSDATA_PREFIX)

Windows

There is windows installer for Tesseract-OCR 3.01 including English langugage data. Other language data can be donwloaded and installed from installer. Installer adapt PATH environment of current user (e.g. user that installed tesseract) and setup TESSDATA_PREFIX environment variable for current user.

If you have problem to run it, please check if you have installed Microsoft Visual C++ 2008 SP1 Redistributable Package (x86).

The dll isn't supported in Tesseract-OCR 3.00/3.01.

Instalation from source

Download the source from svn:

svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only

Windows relevant files are located in vs2008 directory. The same build process as usual applies: Open tesseract.sln with VC++Express 2008 and build all (or just Tesseract) It should compile (in at least release mode) without having to install anything further. The dll dependencies and Leptonica are included. With the full svn download, it should just run immediately after building.

tesseract <image> <outputbasename> [-l lang] [configs]

Support

If you need support please try to search and use tesseract user forum or tesseract developer forum. It is good to read wiki pages before posting on forum.

Installation Notes - Tesseract 2.04

Linux

Instalation process is the same as for version 3.00 just use correct source and language data for Tesseract 2.0x.

Windows

There is no windows installer! There are windows executables: tesseract-2.04.exe.tar.gz (It is not for the 'exe' language.) They are built with VC++ express 2008 and come with absolutely no warranty. If they work for you then great, otherwise get Visual C++ Express 2008 with service pack 1 and build from the source. You can also try tesseract-2.01.exe.tar.gz, which is built with VC++6, and may work better if your windows is old, but note that this is an older version of Tesseract.

If you are building from the sources, there are still (up to v2.04) .dsw and .dsp files for vc++6, but the recommended build platform is now VC++ Express 2008. There are also .sln and .vcproj files for VC++ Express 2008, but these files are not backward compatible with any previous version - not even VC++ Express 2005. Note that the executables produced with the newer compiler are smaller, faster, and, believe it or not, more accurate. (See TestingTesseract.)

New with 2.04: the executables are built with static linking, so they stand more chance of working out of the box on more windows systems.

The executable must reside in the same directory as the tessdata directory. (The Visual Studio projects build the release executable directly to the correct place!)

The command line is:

tesseract <image.tif> <output> [-l <langid>]

For interfacing to other applications, there is a DLL included with the executables, but you may be better off building it yourself. The DLL is NOT built for static C-Runtime, so you will probably need VC++ Express 2008 to run it.

The dll has been updated to allow input of non-binary images. (Thanks to Glen of Jetsoft.)

Non-Windows (or Cygwin)

You have to tell Tesseract through a standard unix mechanism where to find its data directory. You must either:

./configure
make
make install

to move the data files to the standard place, or:

export TESSDATA_PREFIX="directory in which your tessdata resides/"

In either case the command line is:

tesseract <image.tif> <output> [-l <langid>]

New there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for the help.) It might work with your OS if you know how to do that.

If you are linking to the libraries, as Ocropus does, there is now a single master library called libtesseract_full.a.

Libtiff support should now be properly working via configure, but note that you need libtiff-dev, as that contains the header files required to compile the code that uses it.

History:

The engine was developed at Hewlett Packard Laboratories Bristol and at Hewlett Packard Co, Greeley Colorado between 1985 and 1994, with some more changes made in 1996 to port to Windows, and some C++izing in 1998. A lot of the code was written in C, and then some more was written in C++. Since then all the code has been converted to at least compile with a C++ compiler. Currently it builds under Linux with gcc4.0, gcc4.1 and under Windows with VC++2008 Express. The C++ code makes heavy use of a list system using macros. This predates stl, was portable before stl, and is more efficient than stl lists, but has the big negative that if you do get a segmentation violation, it is hard to debug. Another "feature" of the C/C++ split is that the C++ data structures get converted to C data structures to call the low-level C code. This is ugly, and the C++izing of the C code is a step towards eliminating the conversion, but it has not happened yet.

The most recent change is that Tesseract can now recognize 33 languages, is fully UTF8 capable, and is fully trainable. See TrainingTesseract for more information on training.

Tesseract was included in UNLV's Fourth Annual Test of OCR Accuracy. See http://www.isri.unlv.edu/downloads/AT-1995.pdf. With Tesseract 2.00, scripts are now included to allow anyone to reproduce some of these tests. See TestingTesseract for more details.

About the Engine

This code is a raw OCR engine. It has NO OUTPUT FORMATTING, and NO UI. It can detect fixed pitch vs proportional text. Having said that, in 1995, this engine was in the top 3 in terms of character accuracy, and it compiles and runs on both Linux and Windows. Training code IS included in the open source release however, and is now included for those willing to try.

Comment by mcour...@mindspring.com, Aug 27, 2007

i get an error massage: Could not open file, -1 my command is: tesseract test1.gif output -1 i have the eng.<files> in a folder called tesserdata of the folder that conatins the exe.

when i run the command in the tsserdata folder i get an error: Unable to load unicharset file C:/contract/visumatic/FreeOCR/tessdata/tessdata/eng.unicharset

can i can some help?

Comment by tom.gar...@gmail.com, Aug 30, 2007

i get the same error as above on Ubuntu 7.04:

/usr/local/bin/tesseract test.tif out.txt Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset

Comment by peace1...@gmail.com, Sep 2, 2007

You need to download the language charset that you wish to use from the download button above. Extract it to the directory in which your tesseract executable resides and you shall have that error no longer.

Comment by peace1...@gmail.com, Sep 2, 2007

You need to download the language charset that you wish to use from the download button above. Extract it to the directory in which your tesseract executable resides and you shall have that error no longer.

Comment by sandro.k...@gmail.com, Sep 28, 2007

Help please, I'm currently trying to link tessdll to a c#-project. In C# I declared

[DllImport("c:\\tessdll.dll", EntryPoint? = "#21", CallingConvention? = CallingConvention?.Cdecl, SetLastError? = true)]
static public extern int TessDllBeginPageUprightBPP(UInt32 xsize, UInt32 ysize, ref byte b, string lang, uint bpp);

When I call tessdll my application simply closes. Do I need to initialize tessdll somehow?

Many Thanks!

Comment by marcelja...@gmail.com, Oct 30, 2007

To mcour...@mindspring.com and tom.garvin:

your command is wrong. it is not "number one" (-1). rigth is the letter L (-l) to type.

Comment by 2005jimm...@gmail.com, Nov 8, 2007

On windows I found this really easy to use, here are the steps with the Nov 07 version:

1) download tesseract-2.01.exe.tar.gz and tesseract-2.00.eng.tar.gz 2) extract these files into the same folder (7-zip or whatever expanding software you prefer) 3) open a command window for this folder, where the tesseract.exe file is located. 4) prep a tiff image, in my case I took a digital picture of a book, tweaked it in photoshop and saved as a tiff with no compression. You could do the same with the Gimp. 5) now I put the tiff image into the same folder and then in the command window invoke the operation 'tesseract.exe MyImage?.tif MyImageConverted? -l eng' 6) the process runs in the background for a few seconds and then a new text-file appears with the name 'MyImageConverted?.txt'.

Comment by barendge...@gmail.com, Dec 10, 2007

On windows, using VC++Express, when enabling libtiff, I had to take two additional steps:

1) add HAVE_CONFIG_H to the preprocessor definitions

2) create an empty config_auto.h file

this is because HAVE_LIBTIFF is between HAVE_CONFIG_H in file tesseractmain.cpp For the rest it works fine

Comment by samlala...@gmail.com, Dec 11, 2007

I have some JPG and BMP files. What utilities can I use to convert these files to TIF files that TESSERACT will recognize? I used the Paint program provided with my Windows XP, but the TIF file it created was not recognized by TESSERACT.

Here is the log file:

Tesseract Open Source OCR Engine read_tif_image:Error:Illegal image format:Compression Tessedit:Error:Read of file failed:number.tif Signal_exit 31 ABORT. LocCode?: 3 AbortCode?: 3

Comment by HuYich...@gmail.com, Dec 17, 2007

I extracted some English characters and numbers from a scanned document, but the recognition results were not very good. The accuracy was only about 50%. In fact, these characters were very easy to recognize by human. So I think there must be some problems.

1. Should I normalize the characters to specific size before recognition? What's the best width and height of an character for recognition?

2. Will the space between characters affects the recognition performance?

Comment by NAp...@gmail.com, Jan 10, 2008

To: samlal...@yahoo.com, MS Paint uses LZW compression. Try IrfanView? (google it), you can save a file to TIFF and choose the compression. Choose no compression and it will work with Tess. You also may need to reduce the color depth, you can do that in IrfanView? as well.

NA ABillionBillion.com Document Management for Everyone

Comment by urkoben, Jan 23, 2008

It's important to have a big image of the text (in my case, a character size of 20x20 pixels works right) in order case the result case will be blank. With a image too small, i resized it with photoshop (to 300%), and recognized the text without problems.

Comment by James.D....@gmail.com, Jan 26, 2008

The readme says tesseract scans only a single column, but is that limited to only a single page with one column? If not, it's not working. If so, then I may have something useful for others. I wrote a script to scan multipage TIFFs using tesseract. If anyone wants a copy, just e-mail me.

While on the topic, I have a script (intended to run as a cron job) for finding all .tif files and tesseractizing them to help in later content-based searching.

Regards,

Jim

Comment by russnel...@gmail.com, Jan 30, 2008

On Ubuntu, it won't find the data files unless you do this:

sudo ln -s /usr/share/tesseract-ocr/tessdata /usr/bin

That said, there are still problems, e.g. many variables in box.config are not found.

Comment by olliejo...@gmail.com, Feb 1, 2008

This is good stuff! Thanks, The Ray Smith!

A question: it seems like unrecognized characters get replaced by spaces in the output ascii. If this is true, is there a simple way to use some other character, like ~ ?

Comment by freddief...@gmail.com, Mar 13, 2008

what is the script for doing multipage tiffs?

Comment by gogg...@slopsbox.com, Apr 21, 2008

I've added my experiences of using Tesseract here:

http://www.scribd.com/doc/2589070/how-to-scan-books-to-text-files

It is from a very non-expert Windows perspective, so might be of use to some people... Please feel free to add any part of it to the documentation or wiki etc.

Comment by m4rti...@gmail.com, May 4, 2008

you must remove alpha channel from TIFF !!

Comment by jhe...@gmail.com, May 15, 2008

for all the people getting the error:

Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset

or some variation thereof, you'll find that the eng.unicharset and most of the other files in the tessdata directory have size 0 kb. confusingly, they seem to have been put there as place holders. you need to download and install the various language packs separately.

Comment by szhai...@gmail.com, May 19, 2008

How can i add libTIFF support under Linux environment?

Comment by JokuMu...@gmail.com, May 20, 2008

barendgehrels is right. For installing libtiff - follow the original instructions and then do the following:

1) add HAVE_CONFIG_H to the preprocessor definitions 2) create an empty config_auto.h file

Comment by bobg...@gmail.com, May 29, 2008

I have a bunch of documents, all the same size, with a field that is overwritten with a pattern - (perhaps to foil ocr..). How do I go about removing that pattern before attempting ocr? I can send a sample of the field.

Comment by beigua...@gmail.com, Jun 30, 2008

I can not load the executable libtiff from http://gnuwin32.sourceforge.net/packages/tiff.htm

Is there anyone would share yours with me

thanks

Comment by jrouq...@gmail.com, Aug 20, 2008

Can you add the comments from jhearn and m4rtin.m to the main documentation ?

Comment by Ivan.Smi...@gmail.com, Sep 22, 2008

I see that eng.unicharset is not included in the latest zip file (again). I grabbed one from an older zip file, but it appears to not be compatible.

Comment by gamersed...@gmail.com, Nov 5, 2008

Okay, i didn't have the lang file in the directory before i did make install, what should i do?

Comment by neil...@gmail.com, Nov 23, 2008

got mine to work after i changed the extension from .tiff to .tif ! doh!

still the output was not good enough to be recognizable. tesseract got confused by lines that were not aligned (because they belonged to a different article on the same page). It read the lines on the left of the page but not the lines on the right-hand side of the page (because they belonged to a different article and therefore slightly offset).

Comment by schuttel...@gmail.com, Nov 29, 2008

I'm also having problems with the charset files:

Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset

I downloaded them from an older release (they were not included in this one) but no luck :(

Comment by stuporglue, Nov 29, 2008

Wikipedia says "Please note that the website at www.libtiff.org is a hijacked domain and while it now points to the real site for current development at www.remotesensing.org, the libtiff.org site still shows the latest version as 3.6.1, which is not correct. It also has an incorrect address for the Libtiff mailing list."

If that's the truth, it might be better to link to remotesensing instead of libtiff.org.

Comment by ahme...@gmail.com, Dec 15, 2008

same problem here:

Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset

I am trying to get tessnet2 working but in vain

Comment by alexande...@gmail.com, Dec 17, 2008

same Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset problem

downloaded the eng pack but still no luck

Comment by tgerdes3...@yahoo.com, Jan 28, 2009

I was able to build a tesseract executable using Visual Studio 2005. I extracted a page from a PDF file using .NET and created a TIF file to contain the image. I have installed Libtiff support and built my tesseract executable.

Tesseract appears to run without error when I run it against my Tiff file. Although, the output file contains the following characters "S.¤,SQ,Vi(< G u¤,¤n.<<d 6". These are not the text found in the Tiff file I created.

Does anyone know what I am doing wrong?

Comment by oyolen...@gmail.com, Feb 1, 2009

how do i get this to run on a macbook?

Could somebody not work on a UI for this?

Comment by hiral.sh...@gmail.com, Feb 19, 2009

for UI, you can go thought it. this si working fine as C#.NET wrapper on tesseract-ocr. http://groups.google.com/group/tesseract-ocr/browse_thread/thread/d80a3989c5c0931f#

I need help for how can I use it for different language other then english?

Comment by ckins...@gmail.com, Feb 20, 2009

Does anyone have a work around/fix for the Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset issue that people have posted earlier?? I just installed it yesterday and am unable to move ahead at all....

Comment by michael....@wridgways.com.au, Feb 26, 2009

download each of the eng. from http://tesseract-ocr.googlecode.com/svn/trunk/tessdata/ and problem disappered. You will find if you ln -s /usr/local/share/tessdata/eng. all the files are probably zero.. (same in the /usr/src so probably not part of the .tar.gz)

Comment by caiti...@gmail.com, Feb 28, 2009

Installation on a mac (ppc, 10.4.11) with english language ocr in mind:

a) download tesseract-2.03.tar.gz and tesseract-2.00.eng.tar.gz from the downloads page

b) open a terminal, cd to wherever you downloaded the above files, then do:

tar xvfz tesseract-2.03.tar.gz

cd tesseract-2.03

./configure

make

sudo make install

cd ..

tar xvfz tesseract-2.00.eng.tar.gz

sudo mv tessdata/ /usr/local/share/tessdata

sudo chown root:yourusername /usr/local/share/tessdata/

rm -rf tessdata

rm -rf tesseract-2.03

rm tesseract-2.03.tar.gz

rm tesseract-2.00.eng.tar.gz

Note in the above replace yourusername with your short username. eg for me it's sudo chown root:pete /usr/local/share/tessdata/

If you don't know what your short username is, type:

whoami

and the response is your short username.

c) you now have a working install of tesseract set up to do ocr on english language documents. To do other language documents, download the relevant language file, then repeat all steps from "tar xvfz tesseract-2.00.language?.tar.gz" above.

d) NOTE: for tesseract to work, the tiff file you're running it on needs to be renamed to end in .tif (not .tiff) AND it needs to be an image without an alpha channel. If you've renamed the file and tesseract is still barfing, this is probably the problem. Use an image conversion utility with the ability to remove alpha channels to re-save your image. For bulk image conversion I recommend Imagemagick (it's gpl and runs well on the mac).

e) finally, just so everything is on one post, to ocr your tiff image, do:

tesseract inputimage.tif outputtext -l eng

and you should get a file called outputtext.txt.

Comment by jamesarn...@gmail.com, Mar 6, 2009

I can compile the project with Visual Studio 2005, but when I run the tesseract.exe app, I get "The ordinal 166 could not be located in the dynamic link library libtiff3.dll". Libtiff3.dll is from the latest version of LibTIFF, and it is definitely in my path. Anyone else come across this?

Comment by mattbe...@gmail.com, May 6, 2009

We are experiencing the same error with Visual Studio 2008. Is there another mailing list to submit this too?

Comment by modiashu...@gmail.com, May 12, 2009

I am getting the following error I issue the command : tesseract.exe out.tif out

Tesseract Open Source OCR Engine read_tif_image:Error:Illegal image format:Compression Tessedit:Error:Read of file failed:out.tif Signal_exit 31 ABORT. LocCode?: 3 AbortCode?: 3

Can someone please help me in this regard. I have already installed libtiff and set up the path etc as specified above but still I am getting the same error.

Comment by allanc...@gmail.com, May 19, 2009

I am able to get tesseract2.03 running on OSX 10.5 just fine. However, when I compile with libtiff, the output becomes garbage. I tried both libtiff manual build or from Darwinports... any idea how I can go about this?

Comment by omn...@gmail.com, May 28, 2009

Here too, 2.03 after running the command from above: tesseract inputimage.tif outputtext -l eng

I get: Tesseract Open Source OCR Engine Image has 24 bits per pixel and size (450,24) Resolution=72

and then the outputtext.txt is created, but it empty. This is for vanilla text copied from the screen, not handwriting. Any experience with non-error causing blank output?

Comment by starhar...@gmail.com, Jun 8, 2009

i am unable to work with this software..anyone plz give ur email id..i will send the image..plz plz check and tell me the result whether it works..plz help frnds

Comment by gmu...@gmail.com, Jun 15, 2009

Using Instructions from Comment by caitifty, Feb 28, 2009 - Installed 2.03 on MAC 10.5.7 Intel and it compiled and tested with phototest.tif successfully.. starhari86 let me know if u still need some help testing your image.

Comment by thinkdun...@gmail.com, Jun 20, 2009

john-davids-macbook-pro:tesseract-2.03 thinkdunson$ make -bash: make: command not found john-davids-macbook-pro:tesseract-2.03 thinkdunson$ sudo make install Password: sudo: make: command not found

now what? i have no experience with terminal… anyone, please help.

Comment by sta191...@gmail.com, Jul 7, 2009

hey barendgehrels, i was just wondering in which directory should i create the empty config_auto.h file?

Comment by reyes...@gmail.com, Jul 11, 2009

Problem:

Unable to load unicharset file /usr/local/share/tessdata/spa.unicharset

Solution:

1. Download tesseract-2.00.eng.tar.gz from http://code.google.com/p/tesseract-ocr/downloads/list 2. Extract 3. Copy all files to /usr/local/share/tessdata/

;)

Comment by leonhard...@hotmail.com, Jul 15, 2009

i use tesseract2.04 in windows vista and if i use the programm i get an document full of nonsens. i belive, it's because it sets a false charset. the textdocument uses utf8.

Comment by project member jore...@gmail.com, Aug 6, 2009
There is an apt-get package (name unknown)

libleptonica-dev

Comment by tfmorris, Aug 14, 2009

The tesseract 3.0 build for Windows compiles out of the box, but the resulting executable doesn't run because leptonlib.dll is built against libpng12.dll, but libpng13.dll is what's included in SVN.

You can get the appropriate binary for the missing DLL from http://gnuwin32.sourceforge.net/packages/libpng.htm

Comment by n.huy...@gmail.com, Aug 16, 2009

Where can i download version 3.0?

Comment by ozqu...@gmail.com, Aug 22, 2009

i was able to get this working in both windows and cygwin. i found the recognition to be far superior in cygwin with the same language files.

after several unsuccessful attempts, i found rtfm to be the best approach.

Comment by in4tu...@gmail.com, Aug 27, 2009

I've posted my experience/first interactions with tesseract on Ubuntu linux at http://triviaatwork.blogspot.com/2009/08/first-interactions-with-tesseract-ocr.html

Comment by ray...@gmail.com, Oct 1, 2009

On the ReadMe page, under "Installation Notes - 3.00 Prerelease, General", it notes that Japanese language data files were available for version 2.04. Where can I get a hold of those? I have an older version of tesseract.

Comment by vincent....@gmail.com, Oct 2, 2009

Is there a way to get or to visualize the page lay-out, the coordinates of the blocks found by the process ? Is ScrollView? a solution and, if yes, how to do it ?

Comment by xuxu1974...@yahoo.com, Oct 6, 2009

I am using tesseract 2.04. I know there is no layout analysis of tesseract, but is there any information other than the output text? Like line number or character (or word) number for a specific character/word, or anything else. It is OK if these information could be obtained in an intermediate step.

Thanks.

Comment by milto...@gmail.com, Nov 5, 2009

I have found a build error in tesseract-2.04 under Kubuntu 9.10 and g++ 4.4.1.

The file viewer/svutil.cpp does not compile because snprintf is not declared. Exact message:

g++ -DHAVE_CONFIG_H -I. -I.. -I/usr/local/include/liblept -g -O2 -MT svutil.o -MD -MP -MF .deps/svutil.Tpo -c -o svutil.o svutil.cpp svutil.cpp: In constructor ‘SVNetwork::SVNetwork(const char, int)’: svutil.cpp:323: error: ‘snprintf’ was not declared in this scope

It can be easily fixed by including the cstdio header in the file.

Comment by green.e...@gmail.com, Nov 17, 2009

For latest version on SVN (3.0), under OSX Snow Leopard with all required libraries installed via MacPorts?, I get this error after make:

make all-recursive Making all in ccstruct source='blobbox.cpp' object='blobbox.o' libtool=no \

DEPDIR=.deps depmode=none /bin/sh ../config/depcomp \ g++ -DHAVE_CONFIG_H -I. -I.. -I../ccutil -I../cutil -I../image -I../viewer -I/opt/local/include -I/usr/local/include/liblept -g -O2 -c -o blobbox.o blobbox.cpp
/bin/sh: ../config/depcomp: No such file or directory make3?: [blobbox.o] Error 127 make2?: [all-recursive] Error 1 make1?: [all-recursive] Error 1 make: all? Error 2

Comment by green.e...@gmail.com, Nov 18, 2009

Fixed the error by running this: ./runautoconf

Comment by swgemu....@gmail.com, Dec 3, 2009

possible for someone to upload tesseract-ocr 3.0 window bins?

Comment by richardg...@gmail.com, Jan 11, 2010

Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset how come I only get this in the windows command prompt not the cygwin one. files are there. Anyone been able to solve this problem yet? I have it installed on more then one and only one box is giving me this error.

Comment by samba...@gmail.com, Jan 20, 2010

Hi Tesseract OCR Team and All Forum Members,

I am facing the folowing issue:

For one .tif file conversion to .txt the '0 (Zero)' gets converted in text file as 'O (Vowel O)', for another it works fine. Do I need to install any drivers to get this problem with numeric characters resolved? Please suggest ASAP. Teh requirement is really urgent.

Thanks

Comment by nishad...@gmail.com, Jan 31, 2010

Windows Binaries and a GUI is posted there in http://code.google.com/p/lime-ocr/

At present, there is only English language pack. You need to get optional language packs from http://tesseract-ocr.googlecode.com/svn/trunk/tessdata/

Comment by ffournel...@gmail.com, Mar 17, 2010

Hello men,

I'am widely interested in this utility tool. I would like to be able to give other input file format like JPG or maybe better for OCR : the PNG format. Uncompressed Tiff is too heavy file. Is there any location where to download it ? When the next version will be released ?

sambanik > To recognize figure '0' instead of letter 'O' you can force this with the tool to recognize only figures and not letters (cf. Wiki FAQ)

Comment by harris.g...@gmail.com, Apr 8, 2010

Tesseract 2.03 installed on Ubuntu 9.10 through Synaptic package manager. 2.03 is the standard version for Ubuntu 9.10. I have the deu and eng languages installed too, no GUI front ends. I am trying to OCR a 131-page TIFF file, nearly 6mb in size. Tesseract churns through it until about p8 then throws a Segmentation Fault. I would really like to hear anyone's ideas how I go forward?

Comment by r...@interzet.ru, Apr 12, 2010

Is there any way I can make tesseract to work with .tiff files and write his output to standard output?

Comment by bbosw...@phcnw.com, Apr 22, 2010

I had to add the following to the configure file to make Tesseract 2.04 compile on Solaris 10 x86:

{ echo "$as_me:$LINENO: checking for Solaris 10 OS (if so, use -lrt -lsocket -lnsl)" >&5 echo $ECHO_N "checking for Solaris 10 OS (if so, use -lrt -lsocket -lnsl) $ECHO_C" >&6; } if -n "`uname -a | grep SunOS | grep 5.10 `" then

LIBS="-lrt -lsocket -lnsl $LIBS"
else
echo $ECHO_N "(cached) $ECHO_C" >&6
fi

Probably a hack, but it got it to compile without missing symbols ;)

Comment by jklon...@gmail.com, May 4, 2010

anything better? tried on windows ok-ish results but dealing with raw tiffs is too heavy

Comment by jeremyb...@gmail.com, May 5, 2010

The debian packages for leptonica are

>libleptonica and libleptonica-dev

Comment by pavan...@gmail.com, May 13, 2010

I am using tesseract.I have taken the image that conatains charecters "space". I have used below command, tesseract space.tif result In result.txt, only one charecter is present 'a' instead of "space". Please help to solve this problem, kindly share script if you have.

Comment by mahmod.f...@gmail.com, Jul 14, 2010

I tried all steps to install tesseract at Mac all gone well except

sudo chown root:wael /usr/local/share/tessdata/ wael is my machine short name i got chown: wael: Invalid argument

when i skip this step and continue .. when i try to run i got this error Unable to load unicharset file /usr/local/share/tessdata/eng.unicharset

Comment by kos...@gmail.com, Aug 31, 2010

I'm very impressed with Tesseract.

Is it possible to tell it to only read a portion of an image - using an x,y origin and a width,height somehow?

Or do I need to chop up my image? (since I know where the text parts will always be)

thanks!

Comment by 344819...@qq.com, Sep 3, 2010

hi guys. how can i make the tesseract to only recoginze character,not including nums.thanks.

Comment by DnLbusin...@gmail.com, Sep 7, 2010

Everyone,

I have 3 job openings requiring programmers with Tesseract experience. This is in Roanoke VA and it is long term. If you are interested please email me at carl@aptonet.com

Comment by samith143@gmail.com, Sep 23, 2010

Hi guys,

I'm using the Mac OS X 10.6.2, first i download the tesseract 2.04 and language pack tesseract-2.00.eng. install the tesseract on mac machine, then i used the below mention link http://robertcarlsen.net/2009/07/15/cross-compiling-for-iphone-dev-884 to cross compile to iphone, i was sucess on that,

Then again i download the tesseract 3.00 Prereleasecheckout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr-read-only?, and run the command ./configure and make,i did not download the language files because it's already there.after install it i ran the above script

(Note:follwoing changes done in the script LIBFILE=ccmain/libtesseract_full to LIBFILE=/usr/local/lib/libtesseract_api) when i ran the script i'm getting the below error

"usr/bin/lipo: specifed architecture type (arm) for file (lnsout/libtesseract_api.a.arm) does not match it's cputype (16777223) and cpusubtype (3) (should be cputype (12) and cpusubtype (0))"

can any one help me to figure out this.

Comment by fuba...@gmail.com, Sep 29, 2010

The above mentioned build script for iOS has been updated to work with tesseract v3 pre-release: http://robertcarlsen.net/2010/09/24/compiling-tesseract-v3-for-iphone-1299

-r

Comment by d...@yabw.net, Oct 2, 2010

Hi all,

I am running Ubuntu 9.04 netbook on an Acer Aspire. I had managed to configure-make-make install the liblept and tesseract after I installed the other graphics -dev libraries. All seemed fine but when I try to run I get:

dalexy@mobile7:/$ tesseract tesseract: error while loading shared libraries: libtesseract_api.so.3: cannot open shared object file: No such file or directory

The link and file of that name do exist. I have tried putting a tessdata link in /usr/local/bin, but the same message is delivered.

Any suggestions?

David Young

Comment by malky...@gmail.com, Oct 4, 2010

I've compiled tesseract but I don't know how to use the language files from here https://code.google.com/p/tesseract-ocr/downloads/list

I've unpacked language files into /usr/local/share/tessdata/ but I get the error message "Error openning data file /usr/local/share/tessdata/english.traineddata" (or any other language) if I use the -l option even for english. I've tried different language files and the message was the same (of course, different names). If I do not choose the -l option it works (as Engish). So how can I choose the languages?

Comment by futur...@gmail.com, Oct 13, 2010

hi,when i get this error in 3.0 but it works in 2.04


futureha@ubuntu:~$ TESSDATA_PREFIX=/home/futureha/tesseract-3.00/ mftraining combine.tr Failed to load unicharset from file unicharset Building unicharset for mftraining from scratch... Reading combine.tr ... combine has no defined properties.

Error: Unable to open combine.tr!

Fatal error: No error trap defined! Signal_termination_handler called with signal 3000


anyone kowns what happen? thanks

Comment by starca...@gmail.com, Oct 18, 2010

Why does tesseract add itself to the Windows startup? The registry key is: HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion?\Run: Tesseract-OCR

I hate it when programs do this for no good reason as it adds to startup clutter and it looks suspicious; I disabled it in msconfig. By the way, I used the Windows installer for tesseract 3.0: tesseract-ocr-setup-3.00.exe.

Comment by sourasis...@gmail.com, Oct 26, 2010

For fedora 10 the library names are libpng-devel , libjpeg-devel , libtiff-devel.

Comment by jame...@gmail.com, Nov 29, 2010

zlibg is not an available library when trying to install this via apt-get in ubuntu (at least with the default sources). You may want to update the name if it's been changed or add in info as to where one can get it.

Comment by igor.v.f...@gmail.com, Dec 4, 2010

What are the changes in the API? The only note I seem to be able to find is "important change is that you should really be using TessBaseAPI if you are linking with another program" which tells me nothing. I'm getting an error 'TessBaseAPI' has not been declared for a simple test program (which used to work in version 2):

  1. nclude <stddef.h>
  2. nclude <tesseract/baseapi.h>
int main () {
TessBaseAPI::InitWithLanguage?("", "", "", "", false, 0, NULL); TessBaseAPI::TesseractRect?((const unsigned char) NULL, 1, 0, 0, 0, 0, 0); TessBaseAPI::End();
; return 0;
}

Comment by zdra...@gmail.com, Dec 9, 2010

Since there is no dll in the latest version, how should I link in order to be able to use tesseract API?

Comment by peter.vo...@gmail.com, Jan 12, 2011

On calling make on tesseract 2.04 i get a error message. Leptonica 1.67 is installed. An ideas?

g++ -DHAVE_CONFIG_H -I. -I.. -I../ccutil -I../ccstruct -I../image -I../textord -I../viewer -I../ccmain -I/usr/local/include/leptonica -g -O2 -MT leptonica_pageseg.o -MD -MP -MF .deps/leptonica_pageseg.Tpo -c -o leptonica_pageseg.o leptonica_pageseg.cpp leptonica_pageseg.cpp: In static member function âstatic bool LeptonicaPageSeg?::GetHalftoneMask?(Pix, Pix, Boxa, Pixa, bool)â: leptonica_pageseg.cpp:69:3: error: âint32â was not declared in this scope leptonica_pageseg.cpp:69:9: error: expected â;â before âdebugâ leptonica_pageseg.cpp:73:25: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp: In static member function âstatic bool LeptonicaPageSeg?::GetTextlineMask?(Pix, Pix, Pix, Boxa, Pixa, bool)â: leptonica_pageseg.cpp:139:3: error: âint32â was not declared in this scope leptonica_pageseg.cpp:139:9: error: expected â;â before âdebugâ leptonica_pageseg.cpp:143:25: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp: In static member function âstatic bool LeptonicaPageSeg?::GetTextblockMask?(Pix, Pix, Boxa, Pixa, bool)â: leptonica_pageseg.cpp:211:3: error: âint32â was not declared in this scope leptonica_pageseg.cpp:211:9: error: expected â;â before âdebugâ leptonica_pageseg.cpp:220:53: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp: In static member function âstatic bool LeptonicaPageSeg?::GetAllRegions?(Pix, Pix, Pix, Pix, bool)â: leptonica_pageseg.cpp:273:3: error: âint32â was not declared in this scope leptonica_pageseg.cpp:273:9: error: expected â;â before âwâ leptonica_pageseg.cpp:274:27: error: âwâ was not declared in this scope leptonica_pageseg.cpp:274:31: error: âhâ was not declared in this scope leptonica_pageseg.cpp:275:9: error: expected â;â before âdebugâ leptonica_pageseg.cpp:288:7: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp:293:7: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp:298:7: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp:302:7: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp:311:7: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp:320:7: error: âdebugâ was not declared in this scope leptonica_pageseg.cpp:322:58: error: too few arguments to function âPIX pixRenderRandomCmapPtaa(PIX, PTAA, l_int32, l_int32, l_int32)â /usr/local/include/leptonica/leptprotos.h:634:23: note: declared here leptonica_pageseg.cpp:332:7: error: âdebugâ was not declared in this scope make3?: [leptonica_pageseg.o] Error 1 make3?: Leaving directory `/root/tesseract-2.04/pageseg' make2?: [all-recursive] Error 1 make2?: Leaving directory `/root/tesseract-2.04/pageseg' make1?: [all-recursive] Error 1 make1?: Leaving directory `/root/tesseract-2.04' make: all? Error 2

Comment by gaurav.p...@inmantec-cms.org, Feb 1, 2011

g++ -DHAVE_CONFIG_H -I. -I.. -g -O2 -MT svutil.o -MD -MP -MF .deps/svutil.Tpo -c -o svutil.o svutil.cpp svutil.cpp: In constructor ‘SVNetwork::SVNetwork(const char, int)’: svutil.cpp:323: error: ‘snprintf’ was not declared in this scope

After getting this error i searched the comments. There i found to include cstdio.h, but i could not find where is cstdio.h in my system? Please help.

Comment by ml...@nds.com, Feb 8, 2011

Addin tesserat as a static library. If anyone had the problem when upgrading to revision 552 that suddenly no dll support was available anymore for their visual c++ projects, then you could go the way of including tesseract as a static library if that is an option. I will just outline the basics. First copy the project tesseract file and rename it to tesslib. Add the tesslib project to the tesseract.sln project and remove the tesseract.cpp and /.h files. Then in the project properties select, lib instead of application. You can also got to the librarian option and select Link Library Dependencies, this will save you some time when including the lib. Build the project. Now open a new project or your existing one and, you will have to add this to the linker command line. tesseract\tesslib.lib /ignore:4099 tesseract\leptonlib-static-mtdll.lib tesseract\libjpeg-static-mtdll.lib tesseract\libpng-static-mtdll.lib tesseract\libtiff-static-mtdll.lib (Make sure the file exsist) Also in Linker/Input Addition Dependencies add WSock32.Lib, needed for the viewer.lib Now add these header files: apitypes.h baseapi.h publictypes.h thresholder.h unichar.h Now you project should build as before, when you include the baseapi.h

Comment by jim...@gmail.com, Feb 15, 2011

I try to learn the 3.0 Api by comparing to the 2.04 API

The phrase "// Now run the main recognition" appear 3 times in each case. Then I try to find the "2.04" equivalents in 3.0 API. (Why?, because there are more examples how to do it in 2.04).

Question (1)

In 3.0 API method:

int TessBaseAPI::RecognizeText?(ETEXT_DESC monitor)

What are the steps needed prior to calling this method (e.g. Is SetImage? and set the Language sufficient and then pass in the initialized ETEXT_DESC monitor ?)

Question2

In the same method page_res = new PAGE_RES(block_list, &tesseract->prev_word_best_choice);

what is "&tesseract->prev_word_best_choice"

Where can I find out more

FYI:

The equivalent in api 2.04 is:

// Low-level function to recognize the current global image to a string. char TessBaseAPI::RecognizeToString?() {

BLOCK_LIST block_list;

FindLines?(&block_list);
// Now run the main recognition. PAGE_RES page_res = Recognize(&block_list, NULL);
return TesseractToText?(page_res);

}

By comparing to 2.04 api, this helps me to move one step further in getting familiar with 3.0api.

E.g Now I can understand the purpose for having "RecognizeText?" method which is "// Low-level function to recognize the current global image to a string. " The question next is where to I go to get the text string out. E.g. what is the equivalent of "TesseractToText?" in api 3.0

Thanks for the continue support, the 3.0 is truly with lots of improvements, I believe that with enough feedbacks, there will be sufficient blogs from others soon to explain in details that help new user like myself to use the API.

Comment by project member zde...@gmail.com, Feb 15, 2011

@jim: I suggest you to read this (ReadMe) page once again and pay attention to this: If you need support please try to search and use tesseract user forum or tesseract developer forum.

Comment by hoogli@gmail.com, Mar 6, 2011

Anyone with the error message "tesseract: error while loading shared libraries: libtesseract_api.so.3: cannot open shared object file: No such file or directory ", be sure to follow the instructions in this readme, specifically,

sudo ldconfig

Comment by senthilr...@gmail.com, Mar 8, 2011

actual_tessdata_num_entries <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 55

Comment by chrisgo...@gmail.com, Mar 18, 2011

It's not zlibg -- the correct apt-get name is zlib1g-dev

Comment by Steven.R...@gmail.com, Mar 26, 2011

There is one more requirement needed to be add to instruction for building under Linux (Ubuntu 10.10): sudo apt-get install libtool

Comment by andy.deg...@gmail.com, Apr 30, 2011

The apt-get package name is: libleptonica-dev

Comment by enchi...@gmail.com, Jun 6, 2011

So far no one has posted a solution to the error "Unable to load unicharset file" Could someone please give a detailed list of steps to solving this?

Comment by jmr...@gmail.com, Jun 18, 2011

So many work for nothing... Why not use the KIS method? Are we in DOS era? I can't understand... I expected much better...

Comment by saurabhg...@gmail.com, Jul 4, 2011

Anyone tried 3.0 version for with android ?

Comment by OsorioJa...@gmail.com, Jul 14, 2011

very cool software! thank you guys, but i didnt found nothing about batch conversion? is there any doc. about batch, nobatch.chop and this kind of config?

cheers

Comment by li.m...@mywayjx.com, Jul 19, 2011

There is Android App 'OCR Test' by Robert Theis. Anyone tried it? Is the source code modified to adapt Android small footprints?

Comment by scot.ale...@gmail.com, Jul 21, 2011

I was unable to get a working build using the Tesseract 3.0 source under AIX 5.1/gcc 3.3.3. I moved back to Tesseract 2.04, and it built and ran with only a single change. In config/config.h.in, the line "#undef LARGE_FILES" needs to be commented out, so that the 64 bit file I/O operations will link correctly.

Also, the OCR results with 2.04 on marginal text (rubber stamps with very small text, in images scanned from old microfilm) were noticeably better with 2.04 on AIX compared to the pre-built Windows version 3.00.

Comment by dosh...@gmail.com, Jul 26, 2011

Hello,

I'm using the options "-l eng" and a config file with "tessedit_create_hocr 1". Can anyone explain why I get different text layout interpretations when building from svn on linux (debian) vs. using the 3.0 installer on Windows? I replaced the tessdata on linux with the Windows one (because it interpreted columns properly) and it still works differently.

If it is due to a revision that has happened in svn but not in the installer, how can I find out what revision to revert to in order to make them the same?

Thanks in advance, Dev

Comment by frode.m...@gmail.com, Aug 5, 2011

Installation process worked like a charm on OpenSuSE 11.4 - x86_64. Only negative I have to say is that I had to copy the man pages manually to the man-directory. Great software by the way.

Comment by hardikka...@gmail.com, Sep 2, 2011

Hi friends,

With the help of cygwin and ndk i want to run tesseract-3.00 code. .I install all the things related to these. but I don't know how to process with tesseract-3.00 for run it. I used windos xp os . .please help me. .I tried lot. but no output is there. .plz. . Thanx. . .

Comment by will.he...@gmail.com, Sep 7, 2011

"leptonica library missing"

For those on Ubuntu Natty 11.04, I had to add these to the top of my configure file:

CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib"

Comment by berr...@gmail.com, Sep 8, 2011

Actually I haven't found #define HAVE_ZLIB 1 statement in config_auto.h after executing ./configure as suggested in Linux Installation section of this Readme. All needed libraries are installed in my Ubuntu Lucid. I've found #define HAVE_LIBZ 1 instead. Maybe this is correct option I was looking for? When I look through tesseract-3.00/vs2008/include/leptonica/environ.h I see the declaration of variables for the image I/O libraries, plus zlib, and #define HAVE_LIBZ 1 is mentioned there. Maybe there is some mistake in Readme?

Comment by barnabas...@gmail.com, Oct 3, 2011

I am new in OCR, and now I got a problem.

I have got a program generated tiff image (white background without noise, and not so regular font type). The program generate this characters: ö ü , but lot's of time the tesseract recognize: o and u. And lots of time the o and u charaters recognized as ö and ü

Maybe the tesseract think that the dot at the top of the char is noise, but it isn't. In the picture there is no noise at all!

Can anybody help me?

Comment by 173836...@qq.com, Oct 11, 2011

你好,我的这个运行一个c#版本的,第一天没有问题,第二天就出现异常;程式一运转到 Init()这句就会主动加入,没有错误提醒。Bitmap image = new Bitmap("eurotext.tif"); tessnet2.Tesseract ocr = new tessnet2.Tesseract(); ocr.SetVariable?("tessedit_char_whitelist", "0123456789"); // If digit only ocr.Init(Application.StartupPath? + @"\tessdata", "eng", false); // To use correct tessdata List<tessnet2.Word> result = ocr.DoOCR(image, Rectangle.Empty); foreach (tessnet2.Word word in result) Console.WriteLine?("{0} : {1}", word.Confidence, word.Text);

非常郁闷,但是程序放到别人电脑也没问题,很是差异。希望能得到帮助

Comment by whitson....@gmail.com, Oct 12, 2011

@barnabas You might want to take a look at http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3 to learn to train Tesseract for additional fonts.

Comment by shoeb.bo...@gmail.com, Oct 14, 2011

is there any way to avoid those coordinate numbers in brackets that appear after every word??

Comment by tinsukes...@gmail.com, Nov 1, 2011

I've prepared one bash script for build Tesseract 3.01 for iOS SDK 5 using Clang and fat files for universal binaries: http://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/

Comment by zyx...@gmail.com, Nov 10, 2011

The download section now contains files (for example)

tesseract-ocr-3.01.eng.tar.gz English language data for Tesseract 3.01 Should it be used for tesseract 3.01 instead of eng.traineddata.gz English language data for Tesseract (3.00 and up)?

The archive tesseract-ocr-3.01.eng.tar.gz contains eng.traineddata file that is much bigger than a similar file in eng.traineddata.gz archive and a lot of other files.

Comment by nkalnber...@yahoo.com, Nov 11, 2011

@hoogli - thanks for the advice, except "sudo ldconfig" is mentioned in this README only for leptonica, not for the tesseract installation itself. But yes, it works better after :-)

Comment by arthurc...@gmail.com, Nov 12, 2011

I have got a failure when running trying to build tessaract for iphone with script build_fat.sh from this famous link http://robertcarlsen.net/2010/09/24/compiling-tesseract-v3-for-iphone-1299.

The scrips fails when on configuration at the line 30. It complains on leptonica: configure: error: leptonica library missing.
But there is't any problem to build the tessaract on my Mac!

I used tessaract revision 640. My MacOSX version is 10.6.7 and iPhone SDK4.3 (I fixed the SDK version in the script).

May be Im doing a simple mistake, but please help.

Comment by glah...@gmail.com, Jan 16, 2012

I got following error when I try to run a sample...

tesseract c6305d6e-1e9e-4f33-8ae1-046430f9ffae.jpg test.txt

Tesseract Open Source OCR Engine v3.01 with Leptonica Error in pixReadStreamJpeg: function not present Error in pixReadStream: jpeg: no pix returned Error in pixRead: pix not read Unsupported image type.

I have installed leptonica-1.67 and leptonica-1.68 and tried with both and failed and I tried on MAC and Fedora too, neither of worked.

Comment by sirianna...@gmail.com, Feb 4, 2012

I have the same problem as glah...@gmail.com

$ tesseract pictures/ppkrock1_278445a.jpg out.txt -l swe Tesseract Open Source OCR Engine v3.01 with Leptonica Error in pixReadStreamJpeg: function not present Error in pixReadStream: jpeg: no pix returned Error in pixRead: pix not read Unsupported image type.

Comment by markree...@gmail.com, Feb 9 (2 days ago)

"Error in pixReadStreamJpeg: function not present" - simply means you didn't have libjpeg support in leptonica so jpeg files can't be read.


Sign in to add a comment
Powered by Google Project Hosting