|
Project Information
|
The sregex module implements Structural Regular Expressions. Structural Regular Expressions were created by Rob Pike and covered in this paper: http://doc.cat-v.org/bell_labs/structural_regexps/ Structural regular expressions work by describing the shape of the whole string, not just the piece you want to match. Each pattern is a list of operators to perform on a string, each time constraining the range of text that matches the pattern. Examples will make this much clearer. The first operator to consider is the x// operator, which means e(x)tract. When applied to a string, all the substrings that match the regular expression between // are passed on to the next operator in the pattern. Given the source string "Atom-Powered Robots Run Amok" and the pattern "x/A.../" the result would be ['Atom', 'Amok']. The sregex module does that using the 'sres' function: >>> list(sres("Atom-Powered Robots Run Amok", "x/A.../"))
['Atom', 'Amok']A pattern can contain mulitple operators, separated by whitespace, which are applied in order, each to the result of the previous match. >>> list(sres("Atom-Powered Robots Run Amok", "x/A.../ x/.*m$/"))
['Atom']There are four operators in total: x/regex/ - Matches all the text that matches the regular expression y/regex/ - Matches all the text that does not match the regular expression g/regex/ - If the regex finds a match in the string then the whole string is passed along. v/regex/ - If the regex does not find a match in the string then the whole string is passed along. >>> list(sres("Atom-Powered Robots Run Amok", "y/ /"))
['Atom-Powered', 'Robots', 'Run', 'Amok']
>>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/"))
['Atom', 'Powered', 'Robots', 'Run', 'Amok']
>>> list(sres("Atom-Powered Robots Run Amok", "y/ / x/R.*/"))
['Robots', 'Run']
>>> list(sres("Atom-Powered Robots Run Amok", "y/ / x/R./"))
['Ro', 'Ru']
>>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/"))
['Atom', 'Powered', 'Amok']
>>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/ g/om/"))
['Atom']The module provides two other functions: sre(source, pattern)Returns an interator for the index ranges that match the pattern. >>> list(sre("Atom-Powered Robots Run Amok", "y/ / v/^R/ g/om/"))
[(0,4)]sub(source, pattern, repl)Returns source with all the matches to pattern replaced with repl. >>> sub("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/ g/om/", "Coal")
"Coal-Powered Robots Run Amok"The repl argument can also be a callable, in which case it is passed the matching substring and is expected to return the substitute string. >>> sub("Atom-Powered Robots Run Amok", "x/A.../", lambda x: x.upper())
"ATOM-Powered Robots Run AMOK"Installation$ hg clone https://sregex.googlecode.com/hg/ sregex $ cd sregex $ python setup.py install |