My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads

Kaprao

New Ubuntu/Debian packages

New .deb packages have been posted in the downloads area. TECkit is being distributed as a separate package from pyTecKit since it is a library in and of itself, and therefore pyTecKit should rely on it, not contain it.

Home of pyTecKit

Currently this is the host project of the python wrapper for TECkit, a text encoding converter for text from custom encodings to and from Unicode. The original TECkit library was made by Jonathan Kew and is available at SIL.org.

I've released an initial version of the wrapper. It is not complete, but should have enough functionality to satisfy most users. Exception handling still needs some work, but it shouldn't crash with a segmentation fault anymore.

For some direction for how to use this library look at the unit tests. These are in teckitTests.py which binary packages place in /usr/lib/python2.5/site-packages or C:\Python25\Lib\site-packages; or you can just download the source package.

General usage looks like:

import teckit

# instantiate a compiler and compile a file
compiler = teckit.Compiler()
compiler.compile('test.map', 'test.tec')

# open a compiled mapping file
engine = teckit.Engine()
engine.openMapping('test.tec')

# read a file into a string
data = open('testdata.data', 'rb').read()

# convert data and capture the result
result = engine.convert(data)

# string slicing
print result[:20]

# prepare to do regular expression operations
import re
# look for all text belonging to an SFM verse tag
match = re.match(r'\\v (\d{1,3}) ([^\\]*)', unicode(result))
if match:
    # if match, print data gathered
    print 'verse %s: "%s"'%(match.groups(1), match.groups(2))

News

28 December, 2008 For Ubuntu (and other other Debian-based distros) users, I have put together .deb packages for both TECkit and pyTecKit. These are available in the Featured Downloads on the right.

18 June, 2008 After a long break I have returned to developing this project in my spare time. I took a closer look at the Python C/C++ interface and now I'm better tuning the code to Python rather than letting SWIG do magic.

The major changes are for functions that return the TECkit encoded strings. This way I can have better control over how a Python Unicode string object is created. I'm not sure how deep I need to go with this, I may need to write my own Unicode codec that could be loaded on import. I am also writing in exception handling using Python's exception constructs.

I am not sure when all this will be completed, I am only working on this in my spare time but I hope that when it is finished it will be truly finished and no further revisions will be required.

Powered by Google Project Hosting