Export to GitHub

esprima - issue #78

Optionally preserve the raw/verbatim literal


Posted on Dec 9, 2011 by Quick Rabbit

This is still an idea.

Currently the code like '1.0e2' will produce the syntax node:

{
    type: 'Literal',
    value: 100
}

For the purpose of code formatter and rewriter, it is useful to obtain the original numeric literal before it gets converted to a number. This can be achieved if the syntax node is extended to contain extra info:

{
    type: 'Literal',
    value: 100,
    raw: '1.0e2'
}

I'm undecided about the option and property name, 'raw' is good and unambiguous, 'verbatim' is also clear.

Comment #1

Posted on Feb 3, 2012 by Happy Hippo

I created a patch that does exactly this for all Syntax.Literal nodes, i.e. a property 'raw' is added which contains the raw source text for the literal.

This has been particularly useful for 're-generating' source code with the exact same text for numeric literals (scientific notation, hex, decimal) and strings (single-quoted vs. double-quoted).

Tests have been updated, but I did not change the compatibility tests, as Reflect.parse does not return raw literals. Instead, I updated the test runner to ignore the 'raw' property in the comparison.

I'm aware of the fact that this is still an idea. I'm interested to hear what your thoughts are on my implementation.

Patch in the form of a pull request: https://github.com/ariya/esprima/pull/15

Comment #2

Posted on Feb 4, 2012 by Quick Rabbit

The implementation looks good.

Now the question is whether this has to be optional or not. Ideally we stick as compatible as possible to Mozilla Reflect. The raw property can be enabled by passing an option, e.g. raw: true when calling parse() function.

The challenge is to find the place to inject the logic that adds the raw property into the literal nodes.

Comment #3

Posted on Feb 7, 2012 by Quick Rabbit

Can you see if it is possible to make this optional? I'd like to be as compatible as possible with Reflect API.

On the other hand, if making it optional is too much hassle and/or yields too complicated code, we just go ahead and diverge a little bit from the official Reflect compatible output.

Comment #4

Posted on Feb 7, 2012 by Happy Hippo

Adding additional properties (such as 'raw') does not make it incompatible for consumers of the API, so you could argue that we are still compatible with Reflect.parse :)

The pull request at https://github.com/ariya/esprima/pull/15 has been updated with a new commit that makes raw literals optional. The main change is that all Syntax.Literal nodes must now be created through a 'createLiteral' function which has a 'createRawLiteral' variant that includes the raw literal string. This means we now have 1 extra function call in all 8 places where literals are created. I followed the approach of dynamically choosing a function (patch/unpatch) based on the given options.

What do you think? Is this a good way to make raw literals optional without having too much of an impact on the performance and the design of the parser code?

Comment #5

Posted on Feb 7, 2012 by Quick Rabbit

Thanks for looking into this. I think the extra code to make it optional is not that crazy and still maintainable.

As for the performance hit, it will be unavoidable. Have you done any measurement on e.g. Chrome and Firefox to see quantify the speed difference?

Comment #6

Posted on Feb 7, 2012 by Happy Hippo

I ran the full benchmark suite with Node.js, both before and after the changes:

  • Before: 528.2 ms
  • Forced raw literals: 538.4 ms (after my first commmit)
  • Optional raw literals (disabled): 542.2 ms (after my second commmit)

Benchmark results were varying a bit when running the benchmark suite multiple times, but I think this gives a pretty good indication.

Comment #7

Posted on Feb 7, 2012 by Quick Rabbit

Thanks for the effort, it's really appreciated.

If the performance degrades that much, then I guess making it optional (and becomes slower a little bit more) does not harm. We definitely need to look into gaining more speed again, but I'll merge your patches as is for now.

Maybe something like https://github.com/ariya/esprima/commit/4f9af77ddc would be possible?

Comment #8

Posted on Feb 7, 2012 by Happy Hippo

Thanks for the info on the property-access optimization. To see what the impact of such a change would be on the performance, I created a patch against the current master that removes all member accesses in the main parser code so that objects are initialized in one single step. This only occurred in parseObjectInitialiser and parseConditionalExpression.

My patch: https://github.com/joostwim/esprima/commit/b0628522fbe41f5c5a9565a88de47eabd4f4c51e

Unfortunately, I'm unable to reliably determine if this improves performance or not. The benchmark results seem to be fluctuating too much.

I'd appreciate it if you would inspect the patch and maybe even run the benchmark yourself (both before and after the commit). If you have any indication that this patch improves performance, I can merge this with my previous pull request in order to compensate for some of the performance loss of the raw literal addition.

Comment #9

Posted on Feb 8, 2012 by Quick Rabbit

I'll try it myself and let you know how it turns out.

Comment #10

Posted on Feb 9, 2012 by Quick Rabbit

The single commit which optimizes property access consistently shows the speed improvement in my testing using the brand new Chrome 17. Running time for the full benchmarks suite goes down from 393.5 ms to 413.8 ms.

Based on this, I'd say combine all your three commits together. I can test it again once you have it handy.

Comment #11

Posted on Feb 9, 2012 by Happy Hippo

Thanks for your tests! I assume you accidentally swapped the times when you wrote "goes down from 393.5 ms to 413.8 ms".

I closed the old pull request and created a new one at https://github.com/ariya/esprima/pull/18

The new pull request contains two commits: one for the property access optimization, the other one for the optional raw literals. Everything has been updated to work against the current master.

Comment #12

Posted on Feb 9, 2012 by Quick Rabbit

Landed in: https://github.com/ariya/esprima/commit/618656261e https://github.com/ariya/esprima/commit/0fe81b279a

Thanks a lot!

Comment #13

Posted on Feb 10, 2012 by Helpful Elephant

Raw options great! Awesome work!

And I suggest adding raw property to Identifier. Because,

¥u0061

is Identifier 'a', but information '¥¥u0061' is lost. This is related to issue 6 #25

Comment #14

Posted on Feb 10, 2012 by Quick Rabbit

Somehow this breaks IE <= 7, for the same reason as in issue 22.

Comment #15

Posted on Feb 10, 2012 by Quick Rabbit

Minor cleanup: https://github.com/ariya/esprima/commit/20c6aa93e1.

Specialization for raw 'extraction' can be used to solve IE <= 7 problem.

Comment #16

Posted on Feb 10, 2012 by Happy Hippo

I now have access to a VM with IE6 to test this. Apparently, the 'source' variable is an array in IE6, instead of a string. This causes source.slice() to also return an array.

Doing this brings the number of failures down from 107 to 7:

function createRawLiteral(token) {
    var raw = source.slice(token.range[0], token.range[1]);

    if (typeof raw === 'object' && raw instanceof Array) {
        raw = raw.join('');
    }

    return {
        type: Syntax.Literal,
        value: token.value,
        raw: raw
    };
}

The remaining 7 failures seem to occur in the code generator.

Comment #17

Posted on Feb 10, 2012 by Helpful Elephant

Do you think about patching extractSource function like this,

function extractStringSource(from, to) { return source.slice(from, to); }

function extractArraySource(from, to) { return source.slice(from, to).join(''); }

... extractSource = extractStringSource; // when stringToArray is used ctx, extractSource = extractArraySource;

and use 'extractSource' instead of source.slice

Comment #18

Posted on Feb 10, 2012 by Happy Hippo

That sounds like a reasonable approach.

Do you have access to a machine with an older IE version? Do you know what causes some of the code generation tests to fail, even with this fix applied?

Comment #19

Posted on Feb 10, 2012 by Helpful Elephant

I checked it in Win7 IE6, then I found bug in it, and created patch for this :-)

Attachments

Comment #20

Posted on Feb 11, 2012 by Quick Rabbit

IE problem is fixed in https://github.com/ariya/esprima/commit/7443627e03. Thanks!

Status: Fixed

Labels:
Type-Enhancement Priority-Medium Milestone-Release1.0 Component-Parser