What steps will reproduce the problem?
Trying to use the code that makes a whitelist for Tesseract like follows
ocr = tesseract.TessBaseAPI() ocr.SetVariable("tessedit_char_whitelist", "0123456789;") ocr.SetPageSegMode(tesseract.PSM_AUTO) ocr.Init("C:\Program Files (x86)\Tesseract-OCR\","eng",tesseract.OEM_DEFAULT)
What is the expected output? What do you see instead?
Intended output is to have only "0123456789;" characters be recognized when using the image_to_string() function. Using code like what is above, image_to_string() just ignores it and grabs whatever characters it finds.
What version of the product are you using? On what operating system?
pytesseract-0.1, Python 2.7, Windows 8.1
Please provide any additional information below.
I've been trying everything people use for Tesseract-OCR, but that doesn't work with pytesseract. I haven't been able to find any solution or method to whitelisting with the image_to_string() function anywhere, which would be immensely helpful in improving the accuracy of the function.
Thanks in advance for any help on the matter.
Status: New
Labels:
Type-Defect
Priority-Medium