My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members

The sregex module implements Structural Regular Expressions. Structural Regular Expressions were created by Rob Pike and covered in this paper:

http://doc.cat-v.org/bell_labs/structural_regexps/

Structural regular expressions work by describing the shape of the whole string, not just the piece you want to match. Each pattern is a list of operators to perform on a string, each time constraining the range of text that matches the pattern. Examples will make this much clearer.

The first operator to consider is the x// operator, which means e(x)tract. When applied to a string, all the substrings that match the regular expression between // are passed on to the next operator in the pattern.

Given the source string "Atom-Powered Robots Run Amok" and the pattern "x/A.../" the result would be ['Atom', 'Amok']. The sregex module does that using the 'sres' function:

  >>> list(sres("Atom-Powered Robots Run Amok", "x/A.../"))
  ['Atom', 'Amok']

A pattern can contain mulitple operators, separated by whitespace, which are applied in order, each to the result of the previous match.

  >>> list(sres("Atom-Powered Robots Run Amok", "x/A.../ x/.*m$/"))
  ['Atom']

There are four operators in total:

x/regex/ - Matches all the text that matches the regular expression
y/regex/ - Matches all the text that does not match the regular expression
g/regex/ - If the regex finds a match in the string then the whole string is passed along.
v/regex/ - If the regex does not find a match in the string then the whole string is passed along.
  >>> list(sres("Atom-Powered Robots Run Amok", "y/ /"))
  ['Atom-Powered', 'Robots', 'Run', 'Amok']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/"))
  ['Atom', 'Powered', 'Robots', 'Run', 'Amok']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/ / x/R.*/"))
  ['Robots', 'Run']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/ / x/R./"))
  ['Ro', 'Ru']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/"))
  ['Atom', 'Powered', 'Amok']

  >>> list(sres("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/ g/om/"))
  ['Atom']

The module provides two other functions:

sre(source, pattern)

Returns an interator for the index ranges that match the pattern.

    >>> list(sre("Atom-Powered Robots Run Amok", "y/ / v/^R/ g/om/"))
    [(0,4)]

sub(source, pattern, repl)

Returns source with all the matches to pattern replaced with repl.

    >>> sub("Atom-Powered Robots Run Amok", "y/( |-)/ v/^R/ g/om/", "Coal")
    "Coal-Powered Robots Run Amok"

The repl argument can also be a callable, in which case it is passed the matching substring and is expected to return the substitute string.

    >>> sub("Atom-Powered Robots Run Amok", "x/A.../", lambda x: x.upper())
    "ATOM-Powered Robots Run AMOK"

Installation

 $ hg clone https://sregex.googlecode.com/hg/ sregex  
 $ cd sregex
 $ python setup.py install
Powered by Google Project Hosting