data:image/s3,"s3://crabby-images/3eeab/3eeaba752d8c500f904237eb5b0eb38811279730" alt=""
google-highly-open-participation-psf - issue #323
Add support for the 're' module using PCRE
Download and install Shed Skin, and read the included README for usage instructions:
http://shedskin.sourceforge.net
Especially read the part about how to implement libraries. Have a look at the lib/ directory for examples of several standard library module implementations.
Download and install PCRE, the Perl-Compatible Regular Expression library:
Using PCRE, add support for basic 're' functionality (i.e., add lib/ re.py and lib/re.?pp), and illustrate the compatibility between 're' and PCRE by compiling a test program that uses most basic 're' features.
You'll need to have experience with C to pull this off!
Completion:
Submit a patch as an attachment to this ticket.
Task duration: please complete this task within 5 days (120 hours) of claiming it.
Comment #1
Posted on Jan 9, 2008 by Happy CamelComment deleted
Comment #2
Posted on Jan 9, 2008 by Grumpy RhinoI claim this task.
Comment #3
Posted on Jan 9, 2008 by Helpful Giraffe(No comment was entered for this change.)
Comment #4
Posted on Jan 9, 2008 by Happy Camelgood luck! ;)
Comment #5
Posted on Jan 9, 2008 by Grumpy RhinoI've got a working version down (with windows, too [!]) which supports the following methods:
match_object::expand match_object::group match_object::start match_object::end re_object::match re_object::search compile match search
Comment #6
Posted on Jan 9, 2008 by Grumpy Rhino(so essentially I've added support for expand, start and end, and got it working with windows finally :-P)
Comment #7
Posted on Jan 9, 2008 by Happy Camelgood work!
did you implement expand yourself, or did you convert a python version somehow? I don't know how difficult it is to build the other methods on pcre, but perhaps it's useful to use this approach in one or more cases..?
have you seen anything that might be particularly hard to add, or any incompatibilities between re and pcre?
Comment #8
Posted on Jan 9, 2008 by Grumpy Rhinoso far it appears to me that pcre supports everything python does
I made the expand function myself; it was pretty simple, and even simpler with C++ strings. friday I'll implement at least split and findall via the pcre callout functionality (can't guarantee anything tomorrow b/c we've got exams)
Comment #9
Posted on Jan 11, 2008 by Grumpy RhinoI've added split, but I have a question about findall. the documentation says that
in the case of multiple captured subpatterns it returns a list of tuples. I
checked and didn't see anything, so I was wondering if it's possible with your
library to create + populate a tuple2 object when the size isn't known before
compilation (so it'd be constant, just unknown). maybe like a setitem function or
something?
Comment #10
Posted on Jan 11, 2008 by Happy Camelsure, you can just create a 'new tuple2()', and then add things to the underlying STL vector (if x is a pointer to the object, use x->unit.push_back(..))
but the problem here seems to be that the return type of findall is dynamic! sometimes it's a list of strings, sometimes a list of tuples of string :P so it cannot be supported at all..
maybe it would be best to throw an exception here when you detect there are multiple groups? it's always possible to use finditer and then match.group(x), so users can easily code around it..
Comment #11
Posted on Jan 11, 2008 by Happy Cameltuple2<str *, str*> *t = new tuple2<str *, str *>();
t->units.push_back(new str("blah"));
t->units.push_back(new str("bleh"));
Comment #12
Posted on Jan 11, 2008 by Happy Camelactually, with just an exception there will still be possible type inference problems, if the user expects tuples. maybe it's better if you skip findall for now?
Comment #13
Posted on Jan 11, 2008 by Grumpy Rhinoyeah, heh. somehow I had figured with c++'s function overloading there was a way around that. guess things like that what happens when I'm really tired... :-P
so I've done everything in re_object except findall, and there're a few things in match_object which I haven't done yet. the relevant files are attached.
- re.cpp 10.74KB
- re.hpp 2.87KB
- re.py 894
Comment #14
Posted on Jan 12, 2008 by Happy Camelgreat, good work! one thing I was worrying about was that re.cpp would become rather big, but clearly that hasn't happened.. :)
if arguments to a type model are dynamic, but not the return type, their types don't interfere with the result and we can often use a manual C++ template function. (I just did this for getopt.gnu_getopt, and there are a few other examples).
what would you say if we assume findall returns a list of strings, and have the compiler always give a warning when it is used (warning assuming re.findall returns list of strings (use re.finditer for multiple groups)), and we also throw an exception when there are multiple capturing patterns? so it will work fine in most cases, and just generate a warning.
Comment #15
Posted on Jan 13, 2008 by Grumpy RhinoI think this is everything...
I added the flag -lpcre.dll in LFLAGS field of FLAGS, but I'm not sure if that's different for linux systems...
- re.cpp 13.12KB
- re.hpp 3.51KB
- re.py 1.44KB
Comment #16
Posted on Jan 13, 2008 by Grumpy Rhinooh yeah, I noticed that your string concatenation function __add_strs assumed all its parameters to be non-null, which lead to a crash if one of them does happen to be a 'None' type. you probably want to throw an exception (like python does) if it encounters a null-type or something.
Comment #17
Posted on Jan 13, 2008 by Happy CamelI'm impressed, and marking your task as completed. I'll test your patch later this week and commit it to SVN. btw, do you have a test program as well? I'd also like to add that to unit.py..
yes, there are probably other places where checks should be added.. it's good practice to test code with CPython first, and only compile it as a last step, but if checks are easy and don't take much time, they should be added.
of course now I have to ask - would you be interested in doing another SS task, or in staying with the project after the GHOP? I could surely use your help!
Comment #18
Posted on Jan 13, 2008 by Happy Cameloh, could you please document the steps it took to get pcre working under windows?
Comment #19
Posted on Jan 13, 2008 by Happy CamelI had a quick scan of re.py. shouldn't re_object.groupindex have a type (e.g., dict of string to int?)
Comment #20
Posted on Jan 13, 2008 by Grumpy Rhinoer, yeah, guess so :-P
I managed to already find a bug and something I forgot to implement (groupdict and groups), so I'll reupload in a tiny bit
Comment #21
Posted on Jan 13, 2008 by Grumpy Rhinothink everything's fixed now...
I've also attached a test script which I whipped up, but you're welcome to change it if you want
for compilation on windows I put libpcre.dll.a in the /lib directory (the top level one), libpcre-0.dll in a place accessable by the compiled application (eg. in the same directory as it), and added -lpcre.dll to LFLAGS in FLAGS
I might be up to one more task, but preferrably one that's not so big. I have some other things I want to work on, exams the rest of this week then a short ski trip during the weekend, so my schedule is sort of full :-P I could help out some with development, but as a warning I don't really like commitments, and a lot of the time I have a project of my mine that needs work as well.
- re.cpp 14.05KB
- re.hpp 3.61KB
- re.py 1.56KB
- test.py 1.21KB
Comment #22
Posted on Jan 13, 2008 by Happy CamelI tried it here under Ubuntu, using -lpcre, and it seems to work fine. but when I diff the python and compiled versions, I get a few minor differences:
< ['BoB@gmaiL.com', 'sally123_43.d@hOtmail.co.uk']
[('BoB', 'gmaiL.com'), ('sally123_43.d', 'hOtmail.co.uk')] 10d9 < user: bob 13a13 user: bob 17,19d16 < pass: hoho < user: haha < path: /files 21a19,21 user: haha pass: hoho path: /files
Comment #23
Posted on Jan 13, 2008 by Happy Camelbtw, where did you get these dlls, or how did you make them?
Comment #24
Posted on Jan 13, 2008 by Happy Camelhmm. how about adding support for the time module, or part of it? if I'm correct, this is mostly a wrapper around standard C calls.
Comment #25
Posted on Jan 13, 2008 by Grumpy Rhinolol, that first difference is b/c of our findall solution :-P it's actually supposed to return an array of tuples (as you see in the second line), but since dynamic return types aren't really feasable it only returns an array of strings (the first line). the other differences are simply because the dict iterator traversed the dictionary in a different order from python, which is an utter non-issure in my book b/c dictionaries are associative not ordered anyhow.
and I compiled the libraries (it's actually a static library, libpcre.dll.a, and a dynamic one, libpcre-0.dll) with mingw. I originally tried to link it statically (with libpcre.a), but I wasn't able to get that to work so I fell back to dynamic linking...
Comment #26
Posted on Jan 13, 2008 by Grumpy Rhinothe time module might work, but I wouldn't be able to guaruntee a starting time (heh)
Comment #27
Posted on Jan 13, 2008 by Happy Camelokay, good! thanks.
I am planning on requesting a few more tasks, including one for the time module. I'll let you know if/when they come online.
Status: Completed
Labels:
C
thirdparty
coding
shedskin
ClaimedBy-SirNot
Due-20080114.1000