Export to GitHub

jaql - DataModel.wiki


Jaql's types and schema language.

Data Model

Jaql's data model is based on JavaScript Object Notation or JSON. When using literal values in the language (e.g., 42 or "a string" as used in the expressions x = 42 and y == "a string"), the values are specified as JSON. In addition, Jaql has extended JSON with several commonly needed data types. Thus, Jaql accepts valid JSON data but it may not produce valid JSON when non-JSON types are used.

Jaql also includes a powerful schema language to describe data. It includes ideas from JSON Schema, XML schema, and RELAX NG, but tailored to Jaql's syntax. Jaql schema is used as a constraint on data and to improve efficiency where applicable.

Jaql's types are first described, followed by its schema language.

Types

  • Complex Types
    • array
    • record
  • Atomic Types
    • null
    • boolean
    • string
    • Numeric Types
      • long
      • double
      • decfloat
    • binary
    • date
    • schematype
    • function
    • comparator

Complex Types

array

An array is a list of values. It corresponds to JSON's array type.

Examples: ``` // empty array []

// array with three longs [ 1, 2, 3 ]

// array with mixed atomic types [ 1, "a", 3 ]

// array with nested, complex data [ 1, ["a", "b"], [["a", "b"]], ["a", ["b", ["c"]]], {name: "value"}, 2 ] ```

record

A record is a mapping from names to values. It corresponds to JSON's object type.

Field names must be non-null strings.

Examples: ``` // record with one field, whose name is "aName" and whose value is "val" { "aName": "val" }

// jaql permits names to be specified with the double-quotes { aName: "val" }

// a record with mixed atomic types { a: "val", b: 5 }

// a record with a complex type for one of its fields { a: [1,2,3], b: "val" } ```

Atomic Types

null

Just like SQL as well as JSON, jaql's data model includes null.

Examples: ``` // the null value null

// the null value used within a record { a: null, b: 1 } ```

boolean

The literal values for the boolean type are true, false. This is the same as in JSON.

Examples: ``` // the boolean value for TRUE true

// the boolean value for FALSE false

// an array with two boolean values included [ 1, true, 3, false, 4 ] ```

string

Strings are specified much the same way as in JSON. The only exception is that jaql's parser permits single quotes, in addition to the double quotes that are specified by the JSON standard.

Examples: ``` "some string"

'some string'

"some string with an \n embedded newline"

"some string with an embedded \u1234 unicode char" ```

Numeric Types

The numeric types that are supported include long, double and decfloat (e.g., Decimal). The decfloat type corresponds to JSON's numeric type whereas long and double are explicitly supported for performance and convenience.

long

A 64-bit signed integer. If a number can be of type long, then it will be represented as a long by default.

Examples: ``` 1;

-1;

104; ```

double

A 64-bit base-2 floating point value. If a number can be of type double, and is not a long, then it will be represented as a double by default. A number can be coerced to be a double by using a 'd' suffix.

Examples: ``` 1.0;

3.5;

3d;

100e-2; ```

decfloat

A 128-bit base-10 floating point value. A number can be specified to be a decimal only if suffixed by 'm'. The current implementation of decimal handling has lead us to use longs and doubles where possible.

Examples: ``` 1.0m;

3.5m;

3dm;

100e-2m; ```

binary

Binary values are represented as hexadecimal strings and constructed with the hex constructor. Note that binary values are provided as a convenience and are not directly supported by JSON.

Examples: hex('F0A0DDEE');

date

Date values are represented using the following string format: "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'". If an alternative format is needed, the format can be specified as the second argument to date.

Examples: date('2001-07-04T12:08:56.000Z');

schematype

Schemata are represented in Jaql with the schema type. The basic pattern to follow is schema <schema expression>. What follows are several simple examples of schema values; the section on schema language discusses <schema expression> options in more detail:

Examples: ``` schema null;

schema boolean;

schema long|date;

schema [ long, boolean * ];

schema { a: long, b: null };

schema schematype; ```

function

Functions are part of Jaql's data types. They are specified using the following pattern: fn( <param>* ) <body> where <param> is [schema] name[=val]. That is, a param can have an optional schema and an optional default value associated. The <body> is defined by any Jaql expression. Since functions are simply values, they can be assigned to variables which can be invoked.

Examples: ``` fn() 1+2; // creates a function value

(fn() 1+2 )(); // invokes an anonymous function

x = fn() 1+2; // creates a function and assigns it to the variable x

x(); // invokes a function that returns 3

y = fn(a,b) a + b;

y(3,5);

y = fn(schema long a, schema long b) a + b; // specify parameter schema

y = fn(a=1, b=1) a + b; // specify default values

y(); // invocation will use default values

y(2); // bind 2 to a and use b's default value (yields 3)

y(2,3); // override the default values

y(b=2, a=3); // use the parameter name to explicitly bind to parameter value

y = fn(schema long a=1, schema long b=1) a + b; // combine schemas and default values ```

comparator

A comparator is similar to a function except that it is used specifically to construct a comparator. In particular, it is used by sort and top. The built-in function topN explicitly exposes comparators. For the most part, however, comparators are used for Jaql built-in and core operator implementations.

A comparator is specified as follows: cmp( <param> ) [ <body> asc | desc ]. Essentially, a comparator specifies how a single value (param) is to be transformed (<body>) and compared against other transformed values. The asc and desc keywords determine whether the comparator can be used to sort in ascending or descending order.

Examples: ``` cmp(x) [ x desc ]; // compares x values in descending order

cmp(x) [ x.someField asc ]; // assumes x is a record and compares values associated with someField in ascending order ```

Schema Language

Jaql's schema language specifies the type for values. The type can be precise (e.g., x is a long), it can be entirely open (e.g., x is any type), or it can be partially specified (e.g., x is a { a: long, b: any, * }). The example for partial specification can be read as: "x is a record with at least two fields, a and b, plus potentially other fields. Field a is a long and field b is of any type".

Schemas are used as constraints on the data and to optimize how data is processed and stored. In general, Jaql will have more opportunities for optimization when more detailed schema information is provided. However, there are many cases, in particular when exploring new data sets, where partial or no schema specification is more convenient.

The expression schema <schema expression> constructs a schema value. The <schema expression> is defined as follows: <schema expression> ::= <basic> '?'? ('|' <schema expression>)* <basic> ::= <atom> | <array> | <record> | 'nonull' | 'any' <atom> ::= 'string' | 'double' | 'null' | ... <array> ::= '[' ( <schema expression (',' <schema expression>)* '...'?)? ']' <record> ::= '{' ( <field> (',' <field>)*)? '}' <field> ::= (<name> '?'? | '*') (':' <schema expression>)?

Essentially, a schema value is the OR of one or more schema values (e.g., long|string. The ? is used as a short-hand for the null schema. So, long? really translates to long|null. In addition, some schema values support value-based constraints (e.g., long(5). Below, we describe Jaql's schema types in detail:

  • array
  • record
  • null
  • nonnull
  • any
  • boolean
  • string
  • long
  • double
  • decfloat
  • binary
  • date
  • schematype
  • function

Array Schema

An array can be described by constraining the types of its elements and their cardinality. When an array's length is fixed, its called a closed array. Otherwise, the array's length is unbounded, in which case its referred to as an open array.

Example: ``` schema []; // describes only the empty array

schema [*]; // describes any array

schema [ long(value=1), string(value="a"), boolean ]; // describes a closed array, in this case a triple, whose elements are constrained by the given types

schema [ long, boolean * ]; // an open array where the first element is a long and the remaining elements, if present, must be booleans

schema [ long, boolean, * ]; // and open array where the first element is a long, the second is a boolean, and the remaining elements are of any type ```

Record Schema

A record is described by constraining its fields, which in turn are constrained on their name and type. Like arrays, there are open and closed records where the fields are either fixed or unbounded, respectively. In addition, fields can be specified to be optional (e.g., { a? }).

Example: ``` schema {}; // the empty record

schema { a }; // a closed record with a field called "a" of any type

schema { a? }; // the "a" field is optional

schema { a: long }; // the "a" field must be of type long

schema { a: long, b: null, c: nonnull }; // a closed record with 3 fields

schema { * }; // an open record with any number of fields

schema { a: long, * }; // an open record that must contain an "a" field of type long

schema { a: long, *: long }; // a closed record whose second field can be named anything ```

Null Schema

Matches only the null value.

Example: schema null;

Nonnull Schema

Matches any type except for null.

Example: schema nonnull;

Any Schema

Matches any type, including null.

Example: schema any;

Boolean Schema

A value can be constrained to be of type boolean and it can also have its value constrained to either true or false.

Example: ``` schema boolean;

schema boolean(true); ```

String Schema

A string value can be constrained by its length or a specific value.

Example: ``` schema string;

schema string(5); // matches any string of length 5

schema string(value="baab"); ```

Long Schema

Example: ``` schema long;

schema long(5); ```

Double Schema

Example: ``` schema double;

schema double(5d); ```

Decfloat Schema

Example: ``` schema decfloat;

schema decfloat(5m); ```

Binary Schema

Like strings, binary values can be constrained by their length or a specific value.

Example: ``` schema binary;

schema binary(5);

schema binary(value=hex('001122')); ```

Date Schema

Example: ``` schema date;

schema date(date('2000-01-01T12:00:00Z')); ```

Schematype Schema

Example: ``` schema schematype;

schema schematype(value=schema long); ```

Function Schema

Example: schema function;

Examples: ```

```

Finally, schema values can be used in Jaql scripts as constraints on the data.