My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Customizing  

Featured
Updated Feb 4, 2010 by subwiz

JSyntaxPane uses "Lexers" to distinguish different tokens of any supported languages. Each Lexer must implement the Lexer interface. The Lexers provided were all done using the great JFlex. JFlex takes a lexer definition file, and creates a one class lexer. The Lex files for the languages currently supported are in the the JFlex folder of the zip distribution file.

I will not go into too much details about how to write lex files. the JFLex site has a great manual. But in a nutshell, you can copy a provided lex file, modify it to your needs, then run it through JFlex. Put the generated Java file in the jsyntaxpane package, then build. Make sure you modify the %class line and use the same name for the constructor in the block below.

The TokenTypes class is an enum of all supported TokenTypes. You can also add more types if you need. If you do that, also modify the SyntaxStyle and SyntaxStyles class to use those types.

You also need to modify the SyntaxKit class createDefaultDocument method to use your lexer for the given language. You may also want to modify the install method of SyntaxKit to add other default actions for the component. I'm considering an automated method to make all this automated, without changing the SyntaxKit class.

Once you are done, and built your lexer, you can change the SyntaxTester to test your lexer. Modify the Tester class to use your lexer instead of the built in ones. Whenever the caret is moved, the Token under the caret is displayed in the line below. That makes testing very easy.

If you create Lexers, please consider contributing them here.

Comment by scottwe...@gmail.com, Feb 2, 2009

Have you guys considered adding a simple programmatic way to extend the set of tokens recognized for the various supported languages? I'm evaluating JSyntaxPane and several other syntax colorizing editor components right now to be used in an environment where users can enter simple Rhino JavaScript? to inject functionality into a system, but the system loads the JavaScript? evaluation context with a set of variables that I'd like colorized as if they were constants/literals to distinguish their correct usage. Furthermore, those pre-loaded variables may change between various usages of the editor depending on the context in which the script is executing.

I have this working with all three components I'm evaluating, and it's the most difficult with JSyntaxPane because of the static way that content types are registered with the syntax kit. With the other two components I can set token recognizers on a per-instance basis quite easily.

Maybe this is even possible today and I'm just getting lost a bit trying to make it happen! Any guidance?

Comment by cyberpyt...@gmail.com, Mar 24, 2009

Here is a lexer for minijava (I'm not quite sure I should be posting it here):

/*

 * Copyright 2008 Georgios "cyebrpython" Migdos cyberpython@gmail.com

 *

 * Licensed under the Apache License, Version 2.0 (the "License");

 * you may not use this file except in compliance with the License.

 * You may obtain a copy of the License

 *       at http://www.apache.org/licenses/LICENSE-2.0

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */



package jsyntaxpane.lexers;



import jsyntaxpane.DefaultLexer;

import jsyntaxpane.Token;

import jsyntaxpane.TokenType;

 

%% 



%public

%class MiniJavaLexer

%extends DefaultLexer

%final

%unicode

%char

%type Token





%{

    /**

     * Create an empty lexer, yyrset will be called later to reset and assign

     * the reader

     */

    public MiniJavaLexer() {

        super();

    }



    private Token token(TokenType type) {

        return new Token(type, yychar, yylength());

    }



    private Token token(TokenType type, int pairValue) {

        return new Token(type, yychar, yylength(), (byte)pairValue);

    }



    private static final byte PARAN     = 1;

    private static final byte BRACKET   = 2;

    private static final byte CURLY     = 3;



%}



/* main character classes */

LineTerminator = \r|\n|\r\n

InputCharacter = [^\r\n]



WhiteSpace = {LineTerminator} | [ \t\f]+



/* comments */

Comment = {TraditionalComment} | {EndOfLineComment} 



TraditionalComment = "/*" [^*] ~"*/" | "/*" "*"+ "/"

EndOfLineComment = "//" {InputCharacter}* {LineTerminator}?



/* identifiers */

Identifier = [:jletter:][:jletterdigit:]*



/* integer literals */

IntegerLiteral = [0-9]+



%%



<YYINITIAL> {



  /* keywords */

  "class"                        |

  "public"                       |

  "static"                       |

  "void"                         |

  "main"                         |

  "String"                       |

  "extends"                      |

  "return"                       |

  "if"                           |

  "else"                         |

  "while"                        |

  "for"                          |

  "System.out.println"           |

  "this"                         |

  "new"                          |

  "true"                         |

  "false"                        { return token(TokenType.KEYWORD); }



  /* Java Built in types and wrappers */

  "boolean"                      |

  "int"                          { return token(TokenType.TYPE); }

  

  /* operators */



  "("                            { return token(TokenType.OPERATOR,  PARAN); }

  ")"                            { return token(TokenType.OPERATOR, -PARAN); }

  "{"                            { return token(TokenType.OPERATOR,  CURLY); }

  "}"                            { return token(TokenType.OPERATOR, -CURLY); }

  "["                            { return token(TokenType.OPERATOR,  BRACKET); }

  "]"                            { return token(TokenType.OPERATOR, -BRACKET); }

  ";"                            | 

  ","                            | 

  "."                            | 

  "="                            | 

  "<"                            |

  "!"                            | 

  "&&"                           | 

  "+"                            | 

  "-"                            | 

  "*"                            { return token(TokenType.OPERATOR); } 

  



  /* numeric literals */



  {IntegerLiteral}               { return token(TokenType.NUMBER); }

    

  /* comments */

  {Comment}                      { return token(TokenType.COMMENT); }



  /* whitespace */

  {WhiteSpace}                   { }



  /* identifiers */ 

  {Identifier}                   { return token(TokenType.IDENTIFIER); }

}





/* error fallback */

.|\n                             {  }

<<EOF>>                          { return null; }
Comment by Stuwbo...@gmail.com, Mar 27, 2009

Why does the source code not include the lexers package - especially as it used. I can't recompile without it (well, except by including the jar file that has the compiled classes).

Comment by rkuester...@gmail.com, Apr 25, 2009

Hi,

did you guys consider adding auto completion support?

best regards Roland

Comment by luis.bor...@gtempaccount.com, Sep 6, 2009

Hello!

I was developing a compiler for the ada 95 programming language, and in making the GUI for editing source files, I found your most useful project! I wrote a lexer for that language and would like to contribute it here (this lexer is a modification of the lexer I use to interact with a cup-generated parser in my compiler implementation)

/Elementos léxicos del lenguaje de programación ADA-95. Autor: Luis Felipe Borjas Reyes @ 05 Septiembre 2009 /

package jsyntaxpane.lexers;

import jsyntaxpane.DefaultLexer?; import jsyntaxpane.Token; import jsyntaxpane.TokenType?;

%%

%public %class Ada95SyntaxLexer? %extends DefaultLexer? %final %unicode %char %type Token %ignorecase

%{ public Ada95SyntaxLexer?() {

super();
} StringBuffer? string=new StringBuffer?(); /
  • Helper method to create and return a new Token from of TokenType?
private Token token(TokenType? type) {
return new Token(type, yychar, yylength());
} private Token token(TokenType? type, int pairValue) {
return new Token(type, yychar, yylength(), (byte)pairValue);
}

%}

/Los elementos léxicos de ADA/ identifier_letter=[a-zA-Z] digit=[0-9] space_character=" " underline="" num_sign="#" point="." plus="+" minus="-" double_quote=\"{2} comment_start="--" special_character=[^a-zA-Z 0-9\r\n\t\v\f] format_effector=[\t\v\f] line_terminator=\r|\n|\r\n graphic_character={special_character}|{identifier_letter}|{digit}|{space_character} separator={space_character}|{format_effector}|{line_terminator} whitespace={separator}+ comment={comment_start}({graphic_character}|{format_effector}){line_terminator} /Puse ? porque podría darse '' como literal de caracter/ character_literal='{graphic_character}?' /Lo necesario para números decimales:/ numeral={digit}({underline}?{digit}) exponent=Ee?{plus}?{numeral}|Ee?{minus}{numeral} /Macro no usada, pero dejada porque va con el RM/ decimal_literal={numeral}({point}{numeral})?{exponent}? /Lo necesario para números con base/ extended_digit={digit}|[a-fA-F] based_numeral={extended_digit}({underline}?{extended_digit}) base={numeral} based_literal={base}{num_sign}{based_numeral}({point}{based_numeral})?{num_sign}{exponent}? /Las literales numéricas: macro no usada/ numeric_literal={decimal_literal}|{based_literal} /reglas para literales numéricas empotradas:/ number= {digit}{digit} floating_point_literal={numeral}{point}{numeral} integer_literal={numeral} /Los números con exponente/ power_literal={numeral}({point}{numeral})?{exponent} /Identificadores y operadores/ identifier={identifier_letter}({underline}?({identifier_letter}|{digit}))

//las literales booleanas: boolean_literal="true"|"false"

%state STRING %%

/Las reglas léxicas/

/Manejando lo demás en función de YYINITIAL/

<YYINITIAL>{ {comment} {return token(TokenType?.COMMENT);} /el ADA-RM dice que el whitespace es un separador que se requiere entre algunos elementos léxicos revisar eso en la gramática / /El ADA-RM dice que debe haber separadores entre algunas cosas ¿lo manejo acá?/ {whitespace} {} /Los tipos primitivos: / "boolean" {return token(TokenType?.TYPE);} "integer" {return token(TokenType?.TYPE);} "float" {return token(TokenType?.TYPE);}

/Las palabras reservadas: declararlas como terminales en el .cup/ "abort" {return token(TokenType?.KEYWORD);} "abs" {return token(TokenType?.KEYWORD);} "abstract" {return token(TokenType?.KEYWORD);} "accept" {return token(TokenType?.KEYWORD);} "access" {return token(TokenType?.KEYWORD);} "aliased" {return token(TokenType?.KEYWORD);} "all" {return token(TokenType?.KEYWORD);} "and" {return token(TokenType?.KEYWORD);} "array" {return token(TokenType?.KEYWORD);} "at" {return token(TokenType?.KEYWORD);}

"begin" {return token(TokenType?.KEYWORD);} "body" {return token(TokenType?.KEYWORD);}

"case" {return token(TokenType?.KEYWORD);} "constant" {return token(TokenType?.KEYWORD);}

"declare" {return token(TokenType?.KEYWORD);} "delay" {return token(TokenType?.KEYWORD);} "delta" {return token(TokenType?.KEYWORD);} "digits" {return token(TokenType?.KEYWORD);} "do" {return token(TokenType?.KEYWORD);}

"else" {return token(TokenType?.KEYWORD);} "elsif" {return token(TokenType?.KEYWORD);} "end" {return token(TokenType?.KEYWORD);} "entry" {return token(TokenType?.KEYWORD);} "exception" {return token(TokenType?.KEYWORD);} "exit" {return token(TokenType?.KEYWORD);}

"for" {return token(TokenType?.KEYWORD);} "function" {return token(TokenType?.KEYWORD);}

"generic" {return token(TokenType?.KEYWORD);} "goto" {return token(TokenType?.KEYWORD);}

"if" {return token(TokenType?.KEYWORD);} "in" {return token(TokenType?.KEYWORD);} "is" {return token(TokenType?.KEYWORD);}

"limited" {return token(TokenType?.KEYWORD);} "loop" {return token(TokenType?.KEYWORD);}

"mod" {return token(TokenType?.KEYWORD);}

"new" {return token(TokenType?.KEYWORD);} "not" {return token(TokenType?.KEYWORD);} "null" {return token(TokenType?.KEYWORD);}

"of" {return token(TokenType?.KEYWORD);} "or" {return token(TokenType?.KEYWORD);} "others" {return token(TokenType?.KEYWORD);} "out" {return token(TokenType?.KEYWORD);}

"package" {return token(TokenType?.KEYWORD);} "pragma" {return token(TokenType?.KEYWORD);} "private" {return token(TokenType?.KEYWORD);} "procedure" {return token(TokenType?.KEYWORD);} "protected" {return token(TokenType?.KEYWORD);}

"raise" {return token(TokenType?.KEYWORD);} "range" {return token(TokenType?.KEYWORD);} "record" {return token(TokenType?.KEYWORD);} "rem" {return token(TokenType?.KEYWORD);} "renames" {return token(TokenType?.KEYWORD);} "requeue" {return token(TokenType?.KEYWORD);} "return" {return token(TokenType?.KEYWORD);} "reverse" {return token(TokenType?.KEYWORD);}

"select" {return token(TokenType?.KEYWORD);} "separate" {return token(TokenType?.KEYWORD);} "subtype" {return token(TokenType?.KEYWORD);}

"tagged" {return token(TokenType?.KEYWORD);} "task" {return token(TokenType?.KEYWORD);} "terminate" {return token(TokenType?.KEYWORD);} "then" {return token(TokenType?.KEYWORD);} "type" {return token(TokenType?.KEYWORD);}

"until" {return token(TokenType?.KEYWORD);} "use" {return token(TokenType?.KEYWORD);}

"when" {return token(TokenType?.KEYWORD);} "while" {return token(TokenType?.KEYWORD);} "with" {return token(TokenType?.KEYWORD);}

"xor" {return token(TokenType?.KEYWORD);}

/Ahora, lo demás:/ {boolean_literal} {return token(TokenType?.KEYWORD2);} {identifier} {return token(TokenType?.IDENTIFIER);}

{numeric_literal} {return token(TokenType?.NUMBER);} {character_literal} {return token(TokenType?.STRING);} /Manejar las strings: / \" {yybegin(STRING);tokenStart = yychar; tokenLength = 1; }

/Delimitadores como acciones de YYINITIAL/ "&" {return token(TokenType?.KEYWORD);} "'" {return token(TokenType?.KEYWORD);} "(" {return token(TokenType?.OPERATOR);} ")" {return token(TokenType?.OPERATOR);} "" {return token(TokenType?.OPERATOR);} "+" {return token(TokenType?.OPERATOR);} "," {return token(TokenType?.OPERATOR);} "-" {return token(TokenType?.OPERATOR);} "." {return token(TokenType?.OPERATOR);} "/" {return token(TokenType?.OPERATOR);} ":" {return token(TokenType?.OPERATOR);} ";" {return token(TokenType?.OPERATOR);} "<" {return token(TokenType?.OPERATOR);} "=" {return token(TokenType?.OPERATOR);} ">" {return token(TokenType?.OPERATOR);} "|" {return token(TokenType?.OPERATOR);} "=>" {return token(TokenType?.OPERATOR);} ".." {return token(TokenType?.OPERATOR);} "" {return token(TokenType?.OPERATOR);} ":=" {return token(TokenType?.OPERATOR);} "<<" {return token(TokenType?.OPERATOR);} ">>" {return token(TokenType?.OPERATOR);} "<>" {return token(TokenType?.OPERATOR);} "/=" {return token(TokenType?.OPERATOR);} ">=" {return token(TokenType?.OPERATOR);} "<=" {return token(TokenType?.OPERATOR);}

"[" {return token(TokenType?.ERROR);} "]" {return token(TokenType?.ERROR);} "{" {return token(TokenType?.ERROR);} "}" {return token(TokenType?.ERROR);}

/El RM no define al underline como separador/ {underline} {return token(TokenType?.ERROR);} }

<STRING>{

{double_quote} {tokenLength+=yylength();} \" {yybegin(YYINITIAL);

return new Token(TokenType?.STRING, tokenStart, tokenLength + 1);}
/Todo lo que no sea salto de línea o cierre se vale/ [^\"\n\r]+ {tokenLength+=yylength();} /el salto de línea regresa a initial: / {line_terminator} {yybegin(YYINITIAL);}

}

Comment by project member ayman.al...@gmail.com, Sep 8, 2009

To all commenter: I did not notice any of your replies until now. I'll have a look at them one by one in time. Please Open issues for suggestions, bug reports and others.

Comment by jaimea.g...@gmail.com, Oct 13, 2009

Hello, i need develop a editor for php.

someone has to file JFLEX php?

Comment by project member ayman.al...@gmail.com, Oct 13, 2009

please open an issue for php. we can track it better there.

Comment by mario.g...@gmail.com, Apr 8, 2010

Awesome!!! I have also check the autocompletion in java. You've put an smile on my face guys. Keep on working !!!

Comment by davidgip...@gmail.com, Apr 22, 2011

I've made some lexer for the gabc file format (for typesetting medieval church music with gregorio) and hope it is useful to attach it here:

/*
 * Copyright 2011 David Gippner davidgippner@me.com
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License
 *       at http://www.apache.org/licenses/LICENSE-2.0
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package jsyntaxpane.lexers;


import jsyntaxpane.Token;
import jsyntaxpane.TokenType;

%%

%public 
%class GabcLexer
%extends DefaultJFlexLexer
%final
%unicode
%char
%type Token

%{
    /**
     * Create an empty lexer, yyrset will be called later to reset and assign
     * the reader
     */
    public GabcLexer() {
        super();
    }

    @Override
    public int yychar() {
        return yychar;
    }

    private static final byte PARAN     = 1;
    private static final byte BRACKET   = 2;
    private static final byte CURLY     = 3;
    private static final byte WORD      = 4;
%}

/* main character classes */
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]

/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} 

TraditionalComment = "/*" [^*] ~"*/" | "/*" "*"+ "/"
EndOfLineComment = "//" {InputCharacter}* {LineTerminator}?

%%
<YYINITIAL> {

    /* header keywords */
    "number-of-voices:"          |
    "name:"                      |
    "score-copyright:"           |
    "gabc-copyright:"            |
    "office-part:"               |
    "occasion:"                  |
    "meter:"                     |
    "commentary:"                |
    "arranger:"                  |
    "gabc-version:"              |
    "initial-style:"             |
    "mode:"                      |
    "annotation:"                |
    "author:"                    |
    "date:"                      |
    "manuscript:"                |
    "manuscript-reference:"      |
    "manuscript-storage-place:"  |
    "book:"                      |
    "transcriber:"               |
    "generated-by:"              |
    "centering-scheme:"          |
    "transcription-date:"        |
    "style:"                     |
    "virgula-position:"          |
    "lilypond-preamble:"         |
    "opustex-preamble:"          |
    "musixtex-preamble:"         |
    "gregoriotex-font:"          |
    "user-notes:"                { return token(TokenType.KEYWORD); }

    /* clefs */
    "(c" [1-4] ")"               |
    "(cb" [1-4] ")"              |
    "(f" [1-4] ")"               |
    "(fb" [1-4] ")"              { return token(TokenType.KEYWORD2); }
    
    /* formatting tags */
    "b"            |
    "i"            |
    "sp"           |
    "v"            { return token(TokenType.TYPE); }

    /* Header end */
    "%%"           { return token(TokenType.KEYWORD2); }
    
    /* separators */
    "("                            { return token(TokenType.OPERATOR,  PARAN); }
    ")"                            { return token(TokenType.OPERATOR, -PARAN); }
    "{"                            { return token(TokenType.OPERATOR,  CURLY); }
    "}"                            { return token(TokenType.OPERATOR, -CURLY); }
    "["                            { return token(TokenType.OPERATOR,  BRACKET); }
    "]"                            { return token(TokenType.OPERATOR, -BRACKET); }
    "<"                            { return token(TokenType.OPERATOR,  WORD); }
    ">"                            { return token(TokenType.OPERATOR, -WORD); }

    /* breathing signs */
    "`"  |
    ","  |
    ";"  |
    ":"  |
    "::" { return token(TokenType.OPERATOR); }
    
    /* special note signs */
    "!"  |
    "~"  |
    "-"  |
    "+"  |
    "o"  |
    "w"  |
    "s"  { return token(TokenType.IDENTIFIER); }

    /* comments */
    {Comment}   	           { return token(TokenType.COMMENT); }
}

/* error fallback */
.|\n                             {  }
<<EOF>>                          { return null; }

There's one addititonal thing that has come to my mind: is it possible that the JSyntaxPane has to be given an extra argument for Unicode content? My gabc files are not shown with line numbers if I open them in the SyntaxPane? (but this may also be a flaw of my own code)?

Thanks for this great work! David


Sign in to add a comment
Powered by Google Project Hosting