My favorites | Sign in
Project Logo
                
Search
for
Updated Aug 10, 2008 by subwiz
Labels: Featured
Customizing  

JSyntaxPane uses "Lexers" to distinguish different tokens of any supported languages. Each Lexer must implement the Lexer interface. The Lexers provided were all done using the great JFlex. JFlex takes a lexer definition file, and creates a one class lexer. The Lex files for the languages currently supported are in the the JFlex folder of the zip distribution file.

I will not go into too much details about how to write lex files. the JFLex site has a great manual. But in a nutshell, you can copy a provided lex file, modify it to your needs, then run it through JFlex. Put the generated Java file in the jsyntaxpane package, then build. Make sure you modify the %class line and use the same name for the constructor in the block below.

The TokenTypes class is an enum of all supported TokenTypes. You can also add more types if you need. If you do that, also modify the SyntaxStyle and SyntaxStyles class to use those types.

You also need to modify the SyntaxKit class createDefaultDocument method to use your lexer for the given language. You may also want to modify the install method of SyntaxKit to add other default actions for the component. I'm considering an automated method to make all this automated, without changing the SyntaxKit class.

Once you are done, and built your lexer, you can change the SyntaxTester to test your lexer. Modify the Tester class to use your lexer instead of the built in ones. Whenever the caret is moved, the Token under the caret is displayed in the line below. That makes testing very easy.

If you create Lexers, please consider contributing them here.


Comment by scottwells, Feb 02, 2009

Have you guys considered adding a simple programmatic way to extend the set of tokens recognized for the various supported languages? I'm evaluating JSyntaxPane and several other syntax colorizing editor components right now to be used in an environment where users can enter simple Rhino JavaScript? to inject functionality into a system, but the system loads the JavaScript? evaluation context with a set of variables that I'd like colorized as if they were constants/literals to distinguish their correct usage. Furthermore, those pre-loaded variables may change between various usages of the editor depending on the context in which the script is executing.

I have this working with all three components I'm evaluating, and it's the most difficult with JSyntaxPane because of the static way that content types are registered with the syntax kit. With the other two components I can set token recognizers on a per-instance basis quite easily.

Maybe this is even possible today and I'm just getting lost a bit trying to make it happen! Any guidance?

Comment by cyberpython, Mar 24, 2009

Here is a lexer for minijava (I'm not quite sure I should be posting it here):

/*

 * Copyright 2008 Georgios "cyebrpython" Migdos cyberpython@gmail.com

 *

 * Licensed under the Apache License, Version 2.0 (the "License");

 * you may not use this file except in compliance with the License.

 * You may obtain a copy of the License

 *       at http://www.apache.org/licenses/LICENSE-2.0

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */



package jsyntaxpane.lexers;



import jsyntaxpane.DefaultLexer;

import jsyntaxpane.Token;

import jsyntaxpane.TokenType;

 

%% 



%public

%class MiniJavaLexer

%extends DefaultLexer

%final

%unicode

%char

%type Token





%{

    /**

     * Create an empty lexer, yyrset will be called later to reset and assign

     * the reader

     */

    public MiniJavaLexer() {

        super();

    }



    private Token token(TokenType type) {

        return new Token(type, yychar, yylength());

    }



    private Token token(TokenType type, int pairValue) {

        return new Token(type, yychar, yylength(), (byte)pairValue);

    }



    private static final byte PARAN     = 1;

    private static final byte BRACKET   = 2;

    private static final byte CURLY     = 3;



%}



/* main character classes */

LineTerminator = \r|\n|\r\n

InputCharacter = [^\r\n]



WhiteSpace = {LineTerminator} | [ \t\f]+



/* comments */

Comment = {TraditionalComment} | {EndOfLineComment} 



TraditionalComment = "/*" [^*] ~"*/" | "/*" "*"+ "/"

EndOfLineComment = "//" {InputCharacter}* {LineTerminator}?



/* identifiers */

Identifier = [:jletter:][:jletterdigit:]*



/* integer literals */

IntegerLiteral = [0-9]+



%%



<YYINITIAL> {



  /* keywords */

  "class"                        |

  "public"                       |

  "static"                       |

  "void"                         |

  "main"                         |

  "String"                       |

  "extends"                      |

  "return"                       |

  "if"                           |

  "else"                         |

  "while"                        |

  "for"                          |

  "System.out.println"           |

  "this"                         |

  "new"                          |

  "true"                         |

  "false"                        { return token(TokenType.KEYWORD); }



  /* Java Built in types and wrappers */

  "boolean"                      |

  "int"                          { return token(TokenType.TYPE); }

  

  /* operators */



  "("                            { return token(TokenType.OPERATOR,  PARAN); }

  ")"                            { return token(TokenType.OPERATOR, -PARAN); }

  "{"                            { return token(TokenType.OPERATOR,  CURLY); }

  "}"                            { return token(TokenType.OPERATOR, -CURLY); }

  "["                            { return token(TokenType.OPERATOR,  BRACKET); }

  "]"                            { return token(TokenType.OPERATOR, -BRACKET); }

  ";"                            | 

  ","                            | 

  "."                            | 

  "="                            | 

  "<"                            |

  "!"                            | 

  "&&"                           | 

  "+"                            | 

  "-"                            | 

  "*"                            { return token(TokenType.OPERATOR); } 

  



  /* numeric literals */



  {IntegerLiteral}               { return token(TokenType.NUMBER); }

    

  /* comments */

  {Comment}                      { return token(TokenType.COMMENT); }



  /* whitespace */

  {WhiteSpace}                   { }



  /* identifiers */ 

  {Identifier}                   { return token(TokenType.IDENTIFIER); }

}





/* error fallback */

.|\n                             {  }

<<EOF>>                          { return null; }
Comment by Stuwbooth, Mar 27, 2009

Why does the source code not include the lexers package - especially as it used. I can't recompile without it (well, except by including the jar file that has the compiled classes).

Comment by rkuestermann, Apr 25, 2009

Hi,

did you guys consider adding auto completion support?

best regards Roland

Comment by luis.bor...@escolarea.com, Sep 06, 2009

Hello!

I was developing a compiler for the ada 95 programming language, and in making the GUI for editing source files, I found your most useful project! I wrote a lexer for that language and would like to contribute it here (this lexer is a modification of the lexer I use to interact with a cup-generated parser in my compiler implementation)

/Elementos léxicos del lenguaje de programación ADA-95. Autor: Luis Felipe Borjas Reyes @ 05 Septiembre 2009 /

package jsyntaxpane.lexers;

import jsyntaxpane.DefaultLexer?; import jsyntaxpane.Token; import jsyntaxpane.TokenType?;

%%

%public %class Ada95SyntaxLexer? %extends DefaultLexer? %final %unicode %char %type Token %ignorecase

%{ public Ada95SyntaxLexer?() {

super();
} StringBuffer? string=new StringBuffer?(); /
  • Helper method to create and return a new Token from of TokenType?
private Token token(TokenType? type) {
return new Token(type, yychar, yylength());
} private Token token(TokenType? type, int pairValue) {
return new Token(type, yychar, yylength(), (byte)pairValue);
}

%}

/Los elementos léxicos de ADA/ identifier_letter=a-zA-Z? digit=0-9? space_character=" " underline="" num_sign="#" point="." plus="+" minus="-" double_quote=\"{2} comment_start="--" special_character=0-9\r\n\t\v\f? format_effector=\t\v\f? line_terminator=\r|\n|\r\n graphic_character={special_character}|{identifier_letter}|{digit}|{space_character} separator={space_character}|{format_effector}|{line_terminator} whitespace={separator}+ comment={comment_start}({graphic_character}|{format_effector}){line_terminator} /Puse ? porque podría darse '' como literal de caracter/ character_literal='{graphic_character}?' /Lo necesario para números decimales:/ numeral={digit}({underline}?{digit}) exponent=Ee?{plus}?{numeral}|Ee?{minus}{numeral} /Macro no usada, pero dejada porque va con el RM/ decimal_literal={numeral}({point}{numeral})?{exponent}? /Lo necesario para números con base/ extended_digit={digit}|a-fA-F? based_numeral={extended_digit}({underline}?{extended_digit}) base={numeral} based_literal={base}{num_sign}{based_numeral}({point}{based_numeral})?{num_sign}{exponent}? /Las literales numéricas: macro no usada/ numeric_literal={decimal_literal}|{based_literal} /reglas para literales numéricas empotradas:/ number= {digit}{digit} floating_point_literal={numeral}{point}{numeral} integer_literal={numeral} /Los números con exponente/ power_literal={numeral}({point}{numeral})?{exponent} /Identificadores y operadores/ identifier={identifier_letter}({underline}?({identifier_letter}|{digit}))

//las literales booleanas: boolean_literal="true"|"false"

%state STRING %%

/Las reglas léxicas/

/Manejando lo demás en función de YYINITIAL/

<YYINITIAL>{ {comment} {return token(TokenType?.COMMENT);} /el ADA-RM dice que el whitespace es un separador que se requiere entre algunos elementos léxicos revisar eso en la gramática / /El ADA-RM dice que debe haber separadores entre algunas cosas ¿lo manejo acá?/ {whitespace} {} /Los tipos primitivos: / "boolean" {return token(TokenType?.TYPE);} "integer" {return token(TokenType?.TYPE);} "float" {return token(TokenType?.TYPE);}

/Las palabras reservadas: declararlas como terminales en el .cup/ "abort" {return token(TokenType?.KEYWORD);} "abs" {return token(TokenType?.KEYWORD);} "abstract" {return token(TokenType?.KEYWORD);} "accept" {return token(TokenType?.KEYWORD);} "access" {return token(TokenType?.KEYWORD);} "aliased" {return token(TokenType?.KEYWORD);} "all" {return token(TokenType?.KEYWORD);} "and" {return token(TokenType?.KEYWORD);} "array" {return token(TokenType?.KEYWORD);} "at" {return token(TokenType?.KEYWORD);}

"begin" {return token(TokenType?.KEYWORD);} "body" {return token(TokenType?.KEYWORD);}

"case" {return token(TokenType?.KEYWORD);} "constant" {return token(TokenType?.KEYWORD);}

"declare" {return token(TokenType?.KEYWORD);} "delay" {return token(TokenType?.KEYWORD);} "delta" {return token(TokenType?.KEYWORD);} "digits" {return token(TokenType?.KEYWORD);} "do" {return token(TokenType?.KEYWORD);}

"else" {return token(TokenType?.KEYWORD);} "elsif" {return token(TokenType?.KEYWORD);} "end" {return token(TokenType?.KEYWORD);} "entry" {return token(TokenType?.KEYWORD);} "exception" {return token(TokenType?.KEYWORD);} "exit" {return token(TokenType?.KEYWORD);}

"for" {return token(TokenType?.KEYWORD);} "function" {return token(TokenType?.KEYWORD);}

"generic" {return token(TokenType?.KEYWORD);} "goto" {return token(TokenType?.KEYWORD);}

"if" {return token(TokenType?.KEYWORD);} "in" {return token(TokenType?.KEYWORD);} "is" {return token(TokenType?.KEYWORD);}

"limited" {return token(TokenType?.KEYWORD);} "loop" {return token(TokenType?.KEYWORD);}

"mod" {return token(TokenType?.KEYWORD);}

"new" {return token(TokenType?.KEYWORD);} "not" {return token(TokenType?.KEYWORD);} "null" {return token(TokenType?.KEYWORD);}

"of" {return token(TokenType?.KEYWORD);} "or" {return token(TokenType?.KEYWORD);} "others" {return token(TokenType?.KEYWORD);} "out" {return token(TokenType?.KEYWORD);}

"package" {return token(TokenType?.KEYWORD);} "pragma" {return token(TokenType?.KEYWORD);} "private" {return token(TokenType?.KEYWORD);} "procedure" {return token(TokenType?.KEYWORD);} "protected" {return token(TokenType?.KEYWORD);}

"raise" {return token(TokenType?.KEYWORD);} "range" {return token(TokenType?.KEYWORD);} "record" {return token(TokenType?.KEYWORD);} "rem" {return token(TokenType?.KEYWORD);} "renames" {return token(TokenType?.KEYWORD);} "requeue" {return token(TokenType?.KEYWORD);} "return" {return token(TokenType?.KEYWORD);} "reverse" {return token(TokenType?.KEYWORD);}

"select" {return token(TokenType?.KEYWORD);} "separate" {return token(TokenType?.KEYWORD);} "subtype" {return token(TokenType?.KEYWORD);}

"tagged" {return token(TokenType?.KEYWORD);} "task" {return token(TokenType?.KEYWORD);} "terminate" {return token(TokenType?.KEYWORD);} "then" {return token(TokenType?.KEYWORD);} "type" {return token(TokenType?.KEYWORD);}

"until" {return token(TokenType?.KEYWORD);} "use" {return token(TokenType?.KEYWORD);}

"when" {return token(TokenType?.KEYWORD);} "while" {return token(TokenType?.KEYWORD);} "with" {return token(TokenType?.KEYWORD);}

"xor" {return token(TokenType?.KEYWORD);}

/Ahora, lo demás:/ {boolean_literal} {return token(TokenType?.KEYWORD2);} {identifier} {return token(TokenType?.IDENTIFIER);}

{numeric_literal} {return token(TokenType?.NUMBER);} {character_literal} {return token(TokenType?.STRING);} /Manejar las strings: / \" {yybegin(STRING);tokenStart = yychar; tokenLength = 1; }

/Delimitadores como acciones de YYINITIAL/ "&" {return token(TokenType?.KEYWORD);} "'" {return token(TokenType?.KEYWORD);} "(" {return token(TokenType?.OPERATOR);} ")" {return token(TokenType?.OPERATOR);} "" {return token(TokenType?.OPERATOR);} "+" {return token(TokenType?.OPERATOR);} "," {return token(TokenType?.OPERATOR);} "-" {return token(TokenType?.OPERATOR);} "." {return token(TokenType?.OPERATOR);} "/" {return token(TokenType?.OPERATOR);} ":" {return token(TokenType?.OPERATOR);} ";" {return token(TokenType?.OPERATOR);} "<" {return token(TokenType?.OPERATOR);} "=" {return token(TokenType?.OPERATOR);} ">" {return token(TokenType?.OPERATOR);} "|" {return token(TokenType?.OPERATOR);} "=>" {return token(TokenType?.OPERATOR);} ".." {return token(TokenType?.OPERATOR);} "" {return token(TokenType?.OPERATOR);} ":=" {return token(TokenType?.OPERATOR);} "<<" {return token(TokenType?.OPERATOR);} ">>" {return token(TokenType?.OPERATOR);} "<>" {return token(TokenType?.OPERATOR);} "/=" {return token(TokenType?.OPERATOR);} ">=" {return token(TokenType?.OPERATOR);} "<=" {return token(TokenType?.OPERATOR);}

"[" {return token(TokenType?.ERROR);} "]" {return token(TokenType?.ERROR);} "{" {return token(TokenType?.ERROR);} "}" {return token(TokenType?.ERROR);}

/El RM no define al underline como separador/ {underline} {return token(TokenType?.ERROR);} }

<STRING>{

{double_quote} {tokenLength+=yylength();} \" {yybegin(YYINITIAL);

return new Token(TokenType?.STRING, tokenStart, tokenLength + 1);}
/Todo lo que no sea salto de línea o cierre se vale/ [^\"\n\r]+ {tokenLength+=yylength();} /el salto de línea regresa a initial: / {line_terminator} {yybegin(YYINITIAL);}

}

Comment by ayman.alsairafi, Sep 08, 2009

To all commenter: I did not notice any of your replies until now. I'll have a look at them one by one in time. Please Open issues for suggestions, bug reports and others.

Comment by jaimea.gaviria, Oct 13, 2009

Hello, i need develop a editor for php.

someone has to file JFLEX php?

Comment by ayman.alsairafi, Oct 13, 2009

please open an issue for php. we can track it better there.


Sign in to add a comment
Hosted by Google Code