My favorites | Sign in
Project Home Downloads Wiki
Project Information
Members
Featured
Downloads
Wiki pages
Links

Since Sep 8, 2008 / Last update: Feb. 12, 2010

Introduction

NHocr is a command line OCR (Optical Character Recognition) program for Japanese language, etc. It has been designed to recognize machine-printed Japanese characters and some ASCII characters/symbols in an image. NHocr is probably the first Open Source Japanese OCR software (offline, machine-printed), except some experimental, partial codes open to academic communities.

You can also use NHocr through WeOCR service at:

The program is highly experimental, and the character recognition performance is limited. (You would become happier with a commercial product if you want a high performance OCR.)

The character feature used in NHocr is based on Peripheral Local Moment (P-LM) proposed by Hori et al. in late 90's.

NHocr is originally a product of the author's weekend programming. The development work may be rather slow.

Limitations of the current version

  • The current NHocr can handle text block image only, since it has not been equipped with a page layout analysis engine.
  • The recognition accuracy may deteriorate when wide and narrow characters are mixed or when proportional fonts are used.
  • The character segmentation performance is limited, since a very simple segmentation algorithm is used in the current version.
  • The recognition accuracy with ASCII characters may not be so good. Using another OCR, such as tesseract, is recommended for European languages.
  • No language processing (post-processing) is yet included.

Supported platforms and requirements

Solaris SPARC/x86 and Linux are officially supported. NHocr would work on other UNIX(-like) platforms and MS-Windows.

NHocr depends on O2-tools package available at:

NHocr uses FreeType 2 available at:

Supported languages

The current version of NHocr supports Japanese only.

The author is interested in supporting other oriental languages such as Chinese. Character code table cctable-xxx is required. Contributions are welcome.

License

Apache License 2.0 applies to newer versions.

A derivative of MIT-X applies to version 1.5e-32 and older.

Powered by Google Project Hosting