yaree


Yet Another Regular Expression Engine

Introduction

YAREE stands for "Yet Another Regular Expression Engine", and it's basically a RE matcher library. The user inputs a regular expression using only simple constructs (choice, kleene star, concatenation and grouping, for now) and some queries for it's engine, and the sample program outputs the success or failure of the matches. The library API is very clean and concise, allowing the programmer to tweak with the parse tree and the finite state automata behind the scenes.

The main purpose of this toy-project is to study the implementation of:

  • A simple LL(1) recursive descent parser for the RE grammar;

  • A Non-Deterministic Finite State Automata, capable of checking the acceptance of arbitrary strings;

  • A conversor that given a RE parse tree (actually an abstract syntax tree) outputs an equivalent NDFSA;

The language of choice is Python. Since performance isn't an issue here, one can only decide for the coolest PL in this planet :D

Examples

  • Generating a parse tree from a regular expression: ``` from re_parser import RegExpParser from re_graphviz import render_parse_tree

if name == 'main': # The regular expression itself re = '(b.c + #)* + d.e*'

# Parses the RE, returning the tree tree = RegExpParser(re).parse()

print 'Tree = ', tree

# Makes an internal dot representation of the graph and renders it to a PNG render_parse_tree(tree, 'parse_tree.png') ```

http://img6.imageshack.us/img6/1208/parsetree.png

  • Playing with the FSA implementation: ``` from re_fsa import FSA from re_graphviz import render_fsa

if name == 'main': # Creates a FSA given its alphabet fsa = FSA('abc')

# Adds 3 new states
fsa += [0, 1, 2]

# Adds five new transition edges (note that it's in fact non-deterministic)
fsa[0,'#'] = 1
fsa[0,'a'] = 0 # Transition from 0, consuming 'a', to 0 
fsa[1,'b'] = 1
fsa[1,'#'] = 2 # Transition from 1 to 2 consuming nothing (lambda transition)
fsa[2,'c'] = 2

# Sets the initial state
fsa.first(0)

# Sets a final state (can have more than one too)
fsa.final(2)

# Runs the acceptance engine of the FSA, returning True or False
print fsa.accepts('aaaabbbbbccccc')

# Renders the automata to a PNG
render_fsa(fsa, 'fsa.png')

```

http://img7.imageshack.us/img7/5043/fsa.png

  • Now really using the RE engine: ``` from re_matcher import RegularExpression

if name == 'main': re = RegularExpression.compile('(c.a + b* + d*.c) + a.a*')

for s in ('aaa', 'bbbbbab', 'ddddc', 'aabda', 'bbbbbbcaaaaa'):
    print 'S =', s, '- Yes!' if re.match(s) else '- No.'

re.render_automata('fsa.png')

```

http://img10.imageshack.us/img10/5043/fsa.png

Project Information

Labels:
python regularexpressions finitestateautomata