|
Project Information
Links
|
AboutOCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities. The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods. News2011-10 Refactoring is nearly complete, with OCRopus now divided into a number of well-defined native code modules (ocrorast, ocrolseg, ocrofst) and high-level Python code (ocropy). The top-level repository (ocropus) is now a repository that you can check out and that should contain everything needed for building OCRopus. 2011-05 There has been significant refactoring and cleanup over the last year.
PlansWhat remains to be done before the next official release:
Next steps:
Resources
Related Projects
DocumentationThe following is the most important documentation:
If you want to contribute to the primary documentation, please check out hg clone https://wiki.ocropus.googlecode.com/hg and submit patches against the documentation. Additional links you may find useful are here:
Bugs / Issues / EnhancementsPlease use the "Issues" tab above to submit bugs, feature requests, etc. When submitting bug reports, please keep the following in mind:
Until the beta release (version 0.5) we mainly care about "big stuff" bug reports and failing documents; minor compile issues or cross-platform issues don't matter that much right now. Please also only recognition failures on fairly clean scanned documents for the time being. ContributingIf you want to contribute code to OCRopus, or if you have a patched version or variant, please use Google's Server Side Clone Support for Mercurial. You can maintain your own variant, add experimental features, etc., and share your patches/changes easily with others even if we haven't incorporated them into the main branch yet. AcknowledgementsThe system is combining the work of many contributors and previous projects. The core developers work at the IUPR research group at the DFKI and gratefully acknowledge funding by Google and the BMBF TextGrid project. |