My favorites | Sign in
Logo
                
New issue | Search
for
| Advanced search | Search tips
Issue 21: Characters touching table lines are not recognized
1 person starred this issue and may be notified of changes. Back to list
 
Reported by lohith.cs, Apr 27, 2007
What steps will reproduce the problem?
1. run ./ocropus layout prefix test.png
2.
3.

It should generate the sub images for all the text in it.
But it generates the sub images for only the text not touching the table lines.

I am using code with Revision: 78  from SVN. 
Running on linux fedora core OS.


Please find attached the test image file ( test.png )i used.
test.png
2.2 KB   Download
Comment 1 by fil...@repairfaq.org, May 26, 2007
Do not expect resolution of this issue. I was running into the same problem with
"streaky" FAXes (dirt on the CCD glass gives vertical streaks for length of the page)
and *you* will need to deal with the lines (for the next 5-10 years, anyways :)
While my hacks did not result in complete satisfaction, they did prevent tess from
burning 100% CPU for 10 minutes on each such image before giving up. However, in the
end, you are *still* guessing what to put in when you take the lines out.
Disclosure: I know very little about OCR/image processing theory.
Comment 2 by tmb...@gmail.com, Aug 07, 2007
Well, this problem is fixable in principle, but it will require a significant amount
of work.  I hope we'll be able to address it around 1.0 as part of improved line
recognizers.
Labels: Milestone-Release1.0
Comment 3 by tmb...@gmail.com, Jan 12, 2009
(No comment was entered for this change.)
Status: Accepted
Labels: -Type-Defect Type-Enhancement
Comment 4 by tmb...@gmail.com, Jan 12, 2009
(No comment was entered for this change.)
Owner: faisalshafait
Comment 5 by tmb...@gmail.com, Jun 14, 2009
(No comment was entered for this change.)
Labels: SampleImage
Comment 6 by robnokes, Aug 29, 2009
This is the best program that I have found however its batch processing is poor. I
have 20,000 pages to process and it would take a lifetime to manually clean up the files.

ClearImage Image Processing Engines
http://www.inliteresearch.com/homepage/products/ci_image_processing.html


I have archived a Hollywood Sound Effects library catalog and would like to post the
archives on the web for the sound community however the vertical streaks make the OCR
somewhat useless. Hopefully OCROPUS can incorporate some ideas from ClearImage.

Rob Nokes
www.Sounddogs.com
Comment 7 by tmb...@gmail.com, Aug 30, 2009
OCRopus already has many of the cleanup mechanisms in ClearImage.  However, they are
not automatically applied yet.  The beta release will have more information about how
to use these.
Sign in to add a comment

Hosted by Google Code