| Projects on Google Code | Results 1 - 10 of 18 |
Cjklib provides language routines related to Han characters (characters based on Chinese characters named Hanzi, Kanji, Hanja and chu Han respectively) used in writing of the Chinese, the Japanese, infrequently the Korean and formerly the Vietnamese language(s). Functionality is included for charact...
Eclectus is a small Han character dictionary especially designed for
learners of Chinese character based languages like Mandarin Chinese or
Japanese.
Read [About about] Eclectus' features and dependencies and [Install install] it!
<wiki:video url="http://www.youtube.com/watch?v=pwDeUSkQugU"/...
===Paoding Analysis摘要===
*Paoding's Knives* 中文分词具有极 _高效率_ 和 _高扩展性_ 。引入隐喻,采用完全的面向对象设计,构思先进。
高效率:在PIII 1G内存个人机器上,*1秒* 可准确分词 *100万* 汉字。
采用基于 _不限制个数_ 的词典文件对文章进行有效切分,使能够将对词汇分类定义。
能够对未知的词汇进行合理解析
===欢迎===
如果对该项目您有任何建议,欢迎您在http://code.google.com/p/paoding/issues/list 中提出各种issues.
用心的贡献,极其...
= DESCRIPTION =
This module is a word tokenizer for CJK texts. It supports n-gram tokenization. It is handy for users if they are building inverted indexes using Xapian or any other search engine tool. The module is originally written to be used with Xapian. Please also read this [http://lists.ta...
=注:最新的分词系统 HTTPCWS 已经发布,用来取代 PHPCWS。=
=请点击以下网址下载 HTTPCWS:=
=http://code.google.com/p/httpcws=
=原来的 PHPCWS 停止更新。=
----
==Introduction in English==
PHPCWS is a open-source PHP Extension for Chinese Word Segmentation, using ICTCLAS Chinese word segmentation algorithms and Reverse maxi...
php,
expansion,
chinese,
word,
segmentation,
phpcws,
ICTCLAS,
中文分词,
分词,
PHP扩展,
汉语分词,
搜索引擎,
全文索引,
china,
CJK
输入法词库中含有各种专业词汇,可以加快中文输入速度。目前主要集中于医学专业词汇。
Since Sep 8, 2008 / Last update: Dec. 16, 2009
= Introduction =
NHocr is a command line OCR (Optical Character Recognition) program for Japanese language, etc. It has been designed to recognize machine-printed Japanese characters and some ASCII characters/symbols in an image.
NHocr is probably ...
zhspacing fine-tunes several details in typesetting Chinese using XeTeX and XeLaTeX, such as automatic font switch between Chinese and Western characters, skip adjustment of fullwidth punctuations, punctuation prohibitions, automatic skip insertion between Chinese and Western characters or math form...
==Introduction in English==
HTTPCWS is a open-source Chinese Word Segmentation System Based on the HTTP protocol, using ICTCLAS Chinese word segmentation algorithms.
ICTCLAS is a Chinese lexical analysis system, which is able to make Chinese word segmentation, Part-Of-Speech tagging, word ...
php,
expansion,
chinese,
word,
segmentation,
phpcws,
httpcws,
ICTCLAS,
中文分词,
分词,
汉语分词,
搜索引擎,
全文索引,
china,
CJK
=CJK Decomposition File=
The CJK Decomposition File is a graphical analysis of the most common 20,934 Chinese/Japanese characters in Unicode (the 20,922 characters in the Unicode CJK common ideograph block, plus the 12 unique characters from the CJK compatibility block).
For each character, I'...