My favorites | Sign in
Project Logo
                
Search
for
Updated Dec 17, 2007 by srusek
Labels: Featured, Phase-Deploy, Phase-Implementation
QuickLexer  
How to use the atlepage lexer.

Introduction

Atlepage makes writing a lexer super easy. You need to create 3 classes: a Token class, a TokenType enum, and the lexer itself.

Details

Here we create a simple RPM calculator.

Your TokenType Enum

The token type enum is just the list of possible tokens.

enum TokenKind
{
  NUMBER,
  OP,
}

Your Token Class

For the vast majority of projects, your token class will simply inherit from the Atlepage.Token class.

class Token : Atlepage.Token<Token>
{
}

Atlepage.Token is a generic class, it takes the inheriting class as its first parameter.

Your Tokenizer

The lexer defines a series of rules. There are two types of token rules: class tokens and method tokens.

using Atlepage.Lexical;
[Token(@"( |\t|\n)+")]
[Token(@"\d+", Name = "NUMBER")]
[Token(@"\+|-|\*|/", Name = "OP")]
class LexerHandler {}

or

using Atlepage.Lexical;
class LexerHandler
{
  [Token(@"( |\t|\n)+")]
  public Token t_ignored(System.Text.RegularExpressions.Group g, Token t)
  {
    return null;
  }
  [Token(@"\d+")]
  public string t_NUMBER(System.Text.RegularExpressions.Group g, Token t)
  {
    return t;
  }
  [Token(@"\+|-|\*|/")]
  public Token t_OP(System.Text.RegularExpressions.Group g, Token t)
  {
    return t;
  }
}

Token type names are matched by adding t to the beginning of the method name or matched by the optional Name parameter to the Token attribute. If a name does not match a token type then the lexer factory will raise an exception. If a class token has no name or the method returns null, then the token is ignored.

Using your Lexer

All this is combined using the Atlepage.Lexical.Lexer class.

LexerHandler handler = new LexerHandler();
Lexer<Token> lexer = LexerFactory<T>.CreateLexer(
                              typeof(LexerHandler),
                              new GenericEnum<TokenKind>(),
                              handler);
lexer.Begin(" 5 5 + 6 * ");
Token t = lexer.Next();
while (t != null)
{
  Console.WriteLine(t);
  t = lexer.Next();
}

The type is used to generate the regular expressions and lexer. The GenericEnum class takes your token type enum as its parameter, and is used to map the enum to integers. If all of your token rules are class tokens, then you can pass null in for the handler, otherwise it must be an instance of the type you pass in for the first parameter.


Sign in to add a comment
Hosted by Google Code