My favorites | Sign in
Project Logo
Project hosting will be READ-ONLY Wednesday at 8am PST due to brief network maintenance.
                
People details
Project owners:
  tmb...@gmail.com, mezhirov, theraysmith, faisalshafait
Project committers:
cmahnke

About

hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.

There is a Public Specification for the hOCR Format.

Available Programs

Included command line programs:

See the CommandLine Wiki page for more information.

Planned Programs

Possible Programs

Planned Converters

Please let us know if you want to help with these.









Hosted by Google Code