Export to GitHub

jaql - issue #9

Semantics of global variables


Posted on Mar 12, 2009 by Grumpy Monkey

Jaql treats the definition of global variables as simply a definition; it does not evaluate the variable's value immediately. The variable definition is included in each query evaluation, which means the variable will be evaluated for each query evaluation. Moreover, the variable can be inlined inside the query in such a way that causes it to be evaluated multiple times in a single query.

The "materialize $var" statement can be used to force the evaluation of a variable. This doesn't seem very clean, but provides a short-term workaround. One problem with materialization is the value is not stored in a map/reducible location (eg hdfs), so we do not get map/reduce over the variable result. This is a general problem with all variables. Moreover, it is unclear when we should materialize into a distributed location (for large results this makes sense) vs store in memory (for small results). Currently, the user has to handle this using an explicit write.

Another issue is that global variables are never redefined; instead a new variable is created that hides the old one - old references are still to the old variable. This makes variable definitions feel like they are evaluated immediately even though the evaluation is lazy, but causes unexpected results in the case of functions. Consider two examples:

$x = 1; $y = $x + 1; $x = 2; $y; // produces 2, which seems right.

$f = fn() 1; $g = fn() $f() + 1; $f = fn() 2; // incorrectly think this redefines $f() and therefore affects $g $g(); // produces 2, which seems wrong

We need some more thinking here.

Status: Accepted

Labels:
Type-Task Priority-Medium