My favorites | Sign in
Project Logo
                
Feeds:
People details
Project owners:
  joe.gregorio

The sregex module implements Structural Regular Expressions. Structural Regular Expressions were created by Rob Pike and covered in this paper:

http://doc.cat-v.org/bell_labs/structural_regexps/

Structural regular expressions work by describing the shape of the whole string, not just the piece you want to match. Each pattern is a list of operators to perform on a string, each time constraining the range of text that matches the pattern. Examples will make this much clearer.

The first operator to consider is the x// operator, which means e(x)tract. When applied to a string, all the substrings that match the regular expression between // are passed on to the next operator in the pattern.

Given the source string "Atom-Powered Robots Run Amok" and the pattern "x/A.../" the result would be ['Atom', 'Amok']. The sregex module does that using the 'sres' function:

  >>> list(sres("Atom-Powered Robots Run Amok", "x/A.../"))
  ['Atom', 'Amok']

A pattern can contain mulitple operators, separated by whitespace, which are applied in order, each to the result of the previous match.

  >>> list(sres("Atom-Powered Robots Run Amok", "x/A.../ x/.*m$/"))
  ['Atom']

There are four operators in total:

x/regex/ - Matches all the text that matches the regular expression
y/regex/ - Matches all the text that does not match the regular expression
g/regex/ - If the regex finds a match in the string then the whole string is passed along.
v/regex/ - If the regex does not find a match in the string then the whole string is passed along.
  >>> list(sres("Atom-Powered Robots Run Amok", "y/ /"))
  ['Atom-Powered', 'Robots', 'Run', 'Amok']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/"))
  ['Atom', 'Powered', 'Robots', 'Run', 'Amok']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/ / x/R.*/"))
  ['Robots', 'Run']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/ / x/R./"))
  ['Ro', 'Ru']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/"))
  ['Atom', 'Powered', 'Amok']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/ g/om/"))
  ['Atom']

The module provides two other functions:

sre(source, pattern)

Returns an interator for the index ranges that match the pattern.

    >>> list(sre("Atom-Powered Robots Run Amok", "y/ / v/^R/ g/om/"))
    [(0,4)]

sub(source, pattern, repl)

Returns source with all the matches to pattern replaced with repl.

    >>> sub("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/ g/om/", "Coal")
    "Coal-Powered Robots Run Amok"

The repl argument can also be a callable, in which case it is passed the matching substring and is expected to return the substitute string.

    >>> sub("Atom-Powered Robots Run Amok", "x/A.../", lambda x: x.upper())
    "ATOM-Powered Robots Run AMOK"

Installation

 $ hg clone https://sregex.googlecode.com/hg/ sregex  
 $ cd sregex
 $ python setup.py install








Hosted by Google Code