My favorites | Sign in
Project Logo
          
Show all Featured downloads:
nhocr-0.17.tar.gz
Show all Featured wiki pages:
RecentChanges
People details
Project owners:
  hgot07

Since Sep 8, 2008 / Last update: July 3, 2009

Introduction

NHocr is a command line OCR (Optical Character Recognition) program for Japanese language. It has been designed to recognize machine-printed Japanese characters and some ASCII characters/symbols in an image. NHocr is probably the first Open Source Japanese OCR software (offline, machine-printed), except some experimental, partial codes open to academic communities.

You can also use NHocr through WeOCR service at:

The program is highly experimental, and the character recognition performance is limited. (You would become happier with a commercial product if you want a high performance OCR.)

The character feature used in NHocr is based on Peripheral Local Moment (P-LM) proposed by Hori et al. in late 90's.

NHocr is originally a product of the author's weekend programming. The development work may be rather slow.

Limitations of the current version

Supported platforms and requirements

Solaris SPARC/x86 and Linux are officially supported. NHocr would work on other UNIX(-like) platforms and MS-Windows.

NHocr depends on O2-tools package available at:

Supported languages

The current version of NHocr supports Japanese only.

The author is interested in supporting other oriental languages such as Chinese. Character code table cctable.utf-8 is required. Contributions are welcome.

Code availability

The source code distribution was scheduled for 2Q in 2009.

At last, the first source code package has been available as version 0.16 since May 2009.

License

Apache License 2.0 applies to newer versions.

A derivative of MIT-X applies to version 1.5e-32 and older.









Hosted by Google Code