My favorites | Sign in
Google
                
Search
for
Updated Dec 21, 2008 by stefan.haustein
Labels: Featured, Phase-Design
BinaryFormatSpecification  
Definition of the binary file format used by MiniJoe

Introduction

This Wiki page defines the binary file format used by MiniJoe

Details

A MiniJoe binary file consists of a stream of bytes, 16-bit, 32-bit and 64-bit values are stored as the required number of bytes in big-endian format, this is the same format as supported by the Java java.io.DataInputStream and java.io.DataOutputStream interfaces.

A MiniJoe binary file consists of a magic number followed by one or more blocks. A block starts with an identifying byte following by the block data. Whilst most blocks have their length as an short immediately following the type byte, the interpretation of the length is block dependent and described below. Some of these blocks may contain other blocks, which in turn may contain more blocks.

MiniJoe Magic Block
Magic uint32
Block Type Data ...
Type uint8
Data uint8 ...

The following blocks are currently defined:

Type Description
0x00 Comment
0x10 String Table
0x20 Double Literals
0x30 String Literals
0x40 Regex Literals
0x50 Function Literals
0x60 Variable Names
0x80 Byte code
0xE0 Line number data
0xF0 Debug data
0xFF End Marker

Magic

Magic   := "M" "i" "n" "i" "J" "o" "e" Version
Version := uint8

Comment Block

CommentBlock := 0x00  Length  Data
Length       := uint16
Data         := uint8[Length]

This block is used for free-form comments about the MiniJoe binary, its contents are ignored (and possibly discarded) by the client-side runtime. The length field gives the number of data bytes.

(The intention is that the length and data is read by a single call to java.io.DataInputStream.readUtf8).

Global String Table Block

StringTableBlock  :=  0x10  Count  String[Count]
Count             :=  uint16
String            :=  Length  uint8[Length]
Length            :=  uint16

The global string table contains the utf8 data for all strings within this program, and is referenced by the string literal block, the regex literal block, the variable name and possibly by the debug data block. The count field is the number of string entries, each string entry has a uint16 length followed by the string data, as if written by java.io.DataOutputStream.writeUtf8. The strings in this table are implicitly numbered according to their position within this block.

Double Literal Block

DoubleLiteralBlock  :=  0x20  Count  Double[Count]
Count               :=  uint16
Double              :=  uint64

String Literal Block

StringLiteralBlock  :=  0x30  Count  Index[Count]
Count               :=  uint16
Index               :=  uint16

The String Literal block contains indexes into the global string table.

Regex Literal Block

RegexLiteralBlock  :=  0x40  Count  Index[Count]
Count              :=  uint16
Index              :=  uint16

The Regex Literal table contains indexes into the global string table.

Function Literal Block

FunctionLiteralBlock  :=  0x50  Count  FunctionLiteral[Count]
Count                 :=  uint16

A Function Literal block contains one or more Function Literals, each function literal contains one or more other blocks.

FunctionLiteral  :=  Block  ...  0xFF
Count            :=  uint16

Variable Name Block

VariableNameBlock  :=  0x60  Count  Index[Count]
Count              :=  uint16
Index              :=  uint16

The Local Variable Name Block contains indexes into the Global String Table and gives a mapping from local variable names to local variable slots. The variable names are implicitly numbered according to the position within this block.

Code Block

ByteCodeBlock       :=  0x80  Locals  Parameters  Flags  Length  Code[Length]
Locals              :=  uint16    // number of local variable slots required (including parameters)
Parameters          :=  uint16    // number of declared parameters
Flags               :=  uint8     // flags, see below
Count               :=  uint16    // number of code bytes
Code                :=  uint8

Flag Bits

Bit Description
0 if set, the code does not contain closures or with statements and local variables may be allocated via the stack
1..7 reserved

If there is a Variable Name Block corresponding to this Code Block, then the number of local variable must be at least equal to the number of variable name mappings (it may be larger, in case the compiler wishes to use local variable slots to store intermediate values, i.e. the results of common subexpressions).

The parameters are mapped to the first 0 ... N local variable slots.

we should consider a flag that copies the global string table to the local string table (consider a JSON data transfer) -Stefan Haustein 10/17/07 4:47 PM

Line Number Table

LineNumberBlock     := 0xE0  Length  [ProgramCounter, LineNumber]*
Length              := uint16
ProgramCounter      := uint16
LineNumber          := uint16

The Length attribute gives the number of LineNumber pairs in this block. The pairs occur in strictly increasing ProgramCounter order.

End Marker

EndMarker  := 0xFF

Marks the end of a function literal or of the file.

Scoping

A MiniJoe binary file has two levels or scopes: 'program scope' are blocks are the top level, 'function scope' are blocks within a Function Literal. Note that a Function Literal may contain a Function Literal Block which will contain more Function Literals.

At each level no more than one block of each type is allowed. Except were described below, the order of blocks within each scope doesn't matter, however the order would normally be in the order they are described above.

At the program level, the following blocks are valid:

At the function level the following blocks are valid:


Sign in to add a comment