My favorites | Sign in
Google
Projects on Google Code Results 1 - 10 of 11
=Summary= The *tokstream* library allows you to read text files and split them up into individual tokens. It is, in a sense, a glorified version of strtok with file reading and a few tricks to make the process as efficient as possible. ==Features== * clean and minimal interface * simpl...
jTokeniser is a set of classes that provide a variety of tokenisers for your Java projects. Simple tokenisers such as WhiteSpaceTokeniser or StringTokeniser provide basic token extraction whereas RegexTokeniser and BreakIteratorTokeniser give more advantage possibilities for more thorough tokenisers...
WDependency is a PHP tool that analyzes the content of a directory to analyzes dependencies between files and classes and generate dependencies schema in various export format (dot, png, graphml, json, php...)
A ruby/python based HTML parser/tokenizer based on the [http://www.whatwg.org/specs/web-apps/current-work/ WHATWG HTML5 specification] for maximum compatibility with major desktop web browsers. =Notes= * The 0.11 release is rather old. * Users of the sanitizer *must* ensure that they ser...
=XMLCC= XMLCC is a C++ library for handling XML using Design Patterns especially the Composite Pattern. ---- ==About== It allows to generate XML structures using an hierarchical object oriented model that can be written to an XML file easily. Parsing is available by several parsers; a DOM like...
= ParseKit = == About == ParseKit is a Mac OS X Framework written by Todd Ditchendorf in Objective-C 2.0 and released under the MIT Open Source License. ParseKit is suitable for use on Mac OS X Leopard or iPhone OS. The framework is an Objective-C implementation of the tools described in "Buildi...
An XPath implementation for ActionScript 3.0. The project is administered by Peter Hall, of [http://www.memorphic.com Memorphic], but other contributions are welcome. Please post to the discussion group if you would like to be added to the project. ==Platforms== XPath-AS3 can be used in any A...
An assigment in Natural Language Processing (NPL) at Reykjavik University. "In this project, you will develop a tokeniser and a sentence segmentiser for a language of your choice. Actually, it would be best if your program could handle a family of languages (instead of a single language), for e...
Splender is a JavaScript-based, token-driven syntax highlighting engine with theme support. It allows for very efficient syntax highlighting of plain text content embedded in HTML documents. By utilizing a proper lexer/tokenizer, Splender offers optimal performance. Other similar solutions use cr...
!PyGrams converts text to n-grams. Conversion is a three step process. 1) Extract all possible n-grams. Run "form_candidates.py" to create a file containing all possible n-grams. 2) Filter possible n-grams. Run "filter_candidates.py" to find just the n-grams which appear sufficiently frequent...
1 2 Next