|
Project Information
Members
|
DESCRIPTIONThis module is a word tokenizer for CJK texts. It supports n-gram tokenization. It is handy for users if they are building inverted indexes using Xapian or any other search engine tool. The module is originally written to be used with Xapian. Please also read this post on xapian-discuss mailing list. If you are a Perl user, you can also use the perl binding. Currently, there is totally no documentation. Please check out the repository and hack it. FEATURES
USERShttp://code.google.com/p/cjk-tokenizer/wiki/Users TODO
|