This is still an idea.
Currently the code like '1.0e2' will produce the syntax node:
{
type: 'Literal',
value: 100
}
For the purpose of code formatter and rewriter, it is useful to obtain the original numeric literal before it gets converted to a number. This can be achieved if the syntax node is extended to contain extra info:
{
type: 'Literal',
value: 100,
raw: '1.0e2'
}
I'm undecided about the option and property name, 'raw' is good and unambiguous, 'verbatim' is also clear.
Comment #1
Posted on Feb 3, 2012 by Happy HippoI created a patch that does exactly this for all Syntax.Literal nodes, i.e. a property 'raw' is added which contains the raw source text for the literal.
This has been particularly useful for 're-generating' source code with the exact same text for numeric literals (scientific notation, hex, decimal) and strings (single-quoted vs. double-quoted).
Tests have been updated, but I did not change the compatibility tests, as Reflect.parse does not return raw literals. Instead, I updated the test runner to ignore the 'raw' property in the comparison.
I'm aware of the fact that this is still an idea. I'm interested to hear what your thoughts are on my implementation.
Patch in the form of a pull request: https://github.com/ariya/esprima/pull/15
Comment #2
Posted on Feb 4, 2012 by Quick RabbitThe implementation looks good.
Now the question is whether this has to be optional or not. Ideally we stick as compatible as possible to Mozilla Reflect. The raw property can be enabled by passing an option, e.g. raw: true when calling parse() function.
The challenge is to find the place to inject the logic that adds the raw property into the literal nodes.
Comment #3
Posted on Feb 7, 2012 by Quick RabbitCan you see if it is possible to make this optional? I'd like to be as compatible as possible with Reflect API.
On the other hand, if making it optional is too much hassle and/or yields too complicated code, we just go ahead and diverge a little bit from the official Reflect compatible output.
Comment #4
Posted on Feb 7, 2012 by Happy HippoAdding additional properties (such as 'raw') does not make it incompatible for consumers of the API, so you could argue that we are still compatible with Reflect.parse :)
The pull request at https://github.com/ariya/esprima/pull/15 has been updated with a new commit that makes raw literals optional. The main change is that all Syntax.Literal nodes must now be created through a 'createLiteral' function which has a 'createRawLiteral' variant that includes the raw literal string. This means we now have 1 extra function call in all 8 places where literals are created. I followed the approach of dynamically choosing a function (patch/unpatch) based on the given options.
What do you think? Is this a good way to make raw literals optional without having too much of an impact on the performance and the design of the parser code?
Comment #5
Posted on Feb 7, 2012 by Quick RabbitThanks for looking into this. I think the extra code to make it optional is not that crazy and still maintainable.
As for the performance hit, it will be unavoidable. Have you done any measurement on e.g. Chrome and Firefox to see quantify the speed difference?
Comment #6
Posted on Feb 7, 2012 by Happy HippoI ran the full benchmark suite with Node.js, both before and after the changes:
- Before: 528.2 ms
- Forced raw literals: 538.4 ms (after my first commmit)
- Optional raw literals (disabled): 542.2 ms (after my second commmit)
Benchmark results were varying a bit when running the benchmark suite multiple times, but I think this gives a pretty good indication.
Comment #7
Posted on Feb 7, 2012 by Quick RabbitThanks for the effort, it's really appreciated.
If the performance degrades that much, then I guess making it optional (and becomes slower a little bit more) does not harm. We definitely need to look into gaining more speed again, but I'll merge your patches as is for now.
Maybe something like https://github.com/ariya/esprima/commit/4f9af77ddc would be possible?
Comment #8
Posted on Feb 7, 2012 by Happy HippoThanks for the info on the property-access optimization. To see what the impact of such a change would be on the performance, I created a patch against the current master that removes all member accesses in the main parser code so that objects are initialized in one single step. This only occurred in parseObjectInitialiser and parseConditionalExpression.
My patch: https://github.com/joostwim/esprima/commit/b0628522fbe41f5c5a9565a88de47eabd4f4c51e
Unfortunately, I'm unable to reliably determine if this improves performance or not. The benchmark results seem to be fluctuating too much.
I'd appreciate it if you would inspect the patch and maybe even run the benchmark yourself (both before and after the commit). If you have any indication that this patch improves performance, I can merge this with my previous pull request in order to compensate for some of the performance loss of the raw literal addition.
Comment #9
Posted on Feb 8, 2012 by Quick RabbitI'll try it myself and let you know how it turns out.
Comment #10
Posted on Feb 9, 2012 by Quick RabbitThe single commit which optimizes property access consistently shows the speed improvement in my testing using the brand new Chrome 17. Running time for the full benchmarks suite goes down from 393.5 ms to 413.8 ms.
Based on this, I'd say combine all your three commits together. I can test it again once you have it handy.
Comment #11
Posted on Feb 9, 2012 by Happy HippoThanks for your tests! I assume you accidentally swapped the times when you wrote "goes down from 393.5 ms to 413.8 ms".
I closed the old pull request and created a new one at https://github.com/ariya/esprima/pull/18
The new pull request contains two commits: one for the property access optimization, the other one for the optional raw literals. Everything has been updated to work against the current master.
Comment #12
Posted on Feb 9, 2012 by Quick RabbitLanded in: https://github.com/ariya/esprima/commit/618656261e https://github.com/ariya/esprima/commit/0fe81b279a
Thanks a lot!
Comment #13
Posted on Feb 10, 2012 by Helpful ElephantRaw options great! Awesome work!
And I suggest adding raw property to Identifier. Because,
¥u0061
is Identifier 'a', but information '¥¥u0061' is lost. This is related to issue 6 #25
Comment #14
Posted on Feb 10, 2012 by Quick RabbitSomehow this breaks IE <= 7, for the same reason as in issue 22.
Comment #15
Posted on Feb 10, 2012 by Quick RabbitMinor cleanup: https://github.com/ariya/esprima/commit/20c6aa93e1.
Specialization for raw 'extraction' can be used to solve IE <= 7 problem.
Comment #16
Posted on Feb 10, 2012 by Happy HippoI now have access to a VM with IE6 to test this. Apparently, the 'source' variable is an array in IE6, instead of a string. This causes source.slice() to also return an array.
Doing this brings the number of failures down from 107 to 7:
function createRawLiteral(token) {
var raw = source.slice(token.range[0], token.range[1]);
if (typeof raw === 'object' && raw instanceof Array) {
raw = raw.join('');
}
return {
type: Syntax.Literal,
value: token.value,
raw: raw
};
}
The remaining 7 failures seem to occur in the code generator.
Comment #17
Posted on Feb 10, 2012 by Helpful ElephantDo you think about patching extractSource function like this,
function extractStringSource(from, to) { return source.slice(from, to); }
function extractArraySource(from, to) { return source.slice(from, to).join(''); }
... extractSource = extractStringSource; // when stringToArray is used ctx, extractSource = extractArraySource;
and use 'extractSource' instead of source.slice
Comment #18
Posted on Feb 10, 2012 by Happy HippoThat sounds like a reasonable approach.
Do you have access to a machine with an older IE version? Do you know what causes some of the code generation tests to fail, even with this fix applied?
Comment #19
Posted on Feb 10, 2012 by Helpful ElephantI checked it in Win7 IE6, then I found bug in it, and created patch for this :-)
- ie6.patch 2.41KB
Comment #20
Posted on Feb 11, 2012 by Quick RabbitIE problem is fixed in https://github.com/ariya/esprima/commit/7443627e03. Thanks!
Status: Fixed
Labels:
Type-Enhancement
Priority-Medium
Milestone-Release1.0
Component-Parser