My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Links

What is it?

uchardet is a C language binding of the original C++ implementation of the universal charset detection library by Mozilla.

uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text.

The original code of universalchardet is available at http://lxr.mozilla.org/seamonkey/source/extensions/universalchardet/

Techniques used by universalchardet are described at http://www.mozilla.org/projects/intl/UniversalCharsetDetection.html

Supported Encodings

  • Unicode
    • UTF-8
    • UTF-16BE / UTF-16LE
    • UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
  • Chinese
    • ISO-2022-CN
    • BIG5
    • EUC-TW
    • GB18030
    • HZ-GB-23121
  • Japanese
    • ISO-2022-JP
    • SHIFT_JIS
    • EUC-JP
  • Korean
    • ISO-2022-KR
    • EUC-KR
  • Cyrillic
    • ISO-8859-5
    • KOI8-R
    • WINDOWS-1251
    • MACCYRILLIC
    • IBM866
    • IBM855
  • Greek
    • ISO-8859-7
    • WINDOWS-1253
  • Hebrew
    • ISO-8859-8
    • WINDOWS-1255
  • Others
    • WINDOWS-1252

Related Projects

Powered by Google Project Hosting