My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 96: Complete PPC methodjit (with type inference)
1 person starred this issue and may be notified of changes. Back to list
Status:  Verified
Owner:  classi...@floodgap.com
Closed:  Feb 2012


Sign in to add a comment
 
Project Member Reported by classi...@floodgap.com, Oct 12, 2011
Tracejit is dying, see

https://bugzilla.mozilla.org/show_bug.cgi?id=693815

Expected timeframe for disabling is 10; likely to be ended completely in 11. We need to get methodjit completed in a testable form at least when we release the 9 beta, which will be by November.
Oct 25, 2011
Project Member #1 classi...@floodgap.com
Mozilla has given us until 11.
Labels: -Priority-Critical -Milestone-NextBeta9 Priority-High
Nov 3, 2011
Project Member #2 classi...@floodgap.com
Reaching out to Andrew Paprocki based on Wes Garland's suggestion to see if he is working on it in parallel.
Nov 5, 2011
Project Member #3 classi...@floodgap.com
Andrew is interested. Ben Stuhl has sent me a great, compilable list of opcodes. We're going to start working with that.

MIPS mjit is landing:
https://bugzilla.mozilla.org/show_bug.cgi?id=678154
We will base ours on that and SPARC:
https://bugzilla.mozilla.org/show_bug.cgi?id=610323
Nov 5, 2011
Project Member #4 classi...@floodgap.com
attaching ben's WIP for posterity
powerpc-methodjit-WIP.patch
98.3 KB   View   Download
Nov 7, 2011
Project Member #5 classi...@floodgap.com
(No comment was entered for this change.)
Blocking: 106
Nov 7, 2011
Project Member #6 classi...@floodgap.com
Trampoline needs to be PIC
Nov 11, 2011
Project Member #7 classi...@floodgap.com
TestMain passes all tests.
Basic regexes appear to function, but we're not properly relinking jumps.
Nov 19, 2011
Project Member #8 classi...@floodgap.com
Regexes now work.
Repurposing  issue 101  for that.
Blockedon: 101
Dec 1, 2011
Project Member #9 classi...@floodgap.com
(No comment was entered for this change.)
Blocking: 114
Dec 11, 2011
Project Member #10 Tobias.N...@gmail.com
I started to backport the PPC macroassembler to WebKit.
So far I had to remove all that JaegerSpew stuff and to bring it up to date with the recent MacroAssembler changes in the WebKit sources.
The functions that aren't implemented yet could simply be removed (until now at least). I had to implement one new function to do a load8() from a BaseAddress which I implemented similar to the same version of load32(); this was needed to compile YarrJIT. I enabled just Yarr JIT so far and left the other ones (JS and DFG) disabled.
The only thing missing for first tests is the trampoline. But those trampolines are somewhat more complex than the ones for methodjit.

However I came across one thing in the PPC (Macro)Assembler which might not work as intended (note that the enum 'Condition' had to be split into 'RelationalCondition' and 'ResultCondition'):

    enum RelationalCondition {
        Equal = PPCAssembler::ConditionEQ,
        NotEqual = PPCAssembler::ConditionNE,

        // Unsigned comparisons. We must select the proper cmp-family
        // instruction if we get one of these.
        Above = PPCAssembler::ConditionGT,
        AboveOrEqual = PPCAssembler::ConditionGE,
        Below = PPCAssembler::ConditionLT,
        BelowOrEqual = PPCAssembler::ConditionLE,

#define PPC_USE_UNSIGNED_COMPARE(x) (x == Above || x == AboveOrEqual \
    || x == Below || x == BelowOrEqual)

        GreaterThan = PPCAssembler::ConditionGT,
        GreaterThanOrEqual = PPCAssembler::ConditionGE,
        LessThan = PPCAssembler::ConditionLT,
        LessThanOrEqual = PPCAssembler::ConditionLE
    };

Now I don't see any way the compiler could distinguish between Above(OrEqual)/Below(OrEqual) and GreaterThan(OrEqual)/LessThan(OrEqual). So for me PPC_USE_UNSIGNED_COMPARE() should be true for all eight of them.
I had to look into this because the more recent MacroAssembler does a 'switch case' with that enumeration as cases. That obviously had to fail.
Dec 11, 2011
Project Member #11 classi...@floodgap.com
Yes, it's already corrected in 9.0-final changesets. The methodjit now fails only six tests, but I'm not making much progress on those. The assemblers have changed rather dramatically since the beta changesets, so you might want to wait before further work (Fx9 is scheduled for final signoff this week).
Dec 11, 2011
Project Member #12 Tobias.N...@gmail.com
That sounds great!

From what I've learned already there should not be any problem merging the progress made for TFF into WebKit. And if mozilla chose to decide to bring their assembler into sync with JavaScriptCore once again that work would already have mostly been done for the PPC part.
Dec 16, 2011
Project Member #13 classi...@floodgap.com
Methodjit now passes all tests.

Type inference doesn't work yet, though.
Summary: Complete PPC methodjit (with type inference)
Dec 17, 2011
Project Member #14 classi...@floodgap.com
Let's try for 10
Labels: Milestone-NextBeta10
Dec 19, 2011
Project Member #16 classi...@floodgap.com
MIPS passes all of its tests with no special changes, so we must be doing something wrong. First theory: we need to save more of the linkage area with stub calls.
Dec 20, 2011
Project Member #17 classi...@floodgap.com
Oh, crap. Type inference *requires* a square root instruction. G3 and G4 don't have one.

This might work as a (slow) replacement. It only needs to load two FP constants.

https://code.google.com/p/drewthaler/source/browse/trunk/codesize/fsqrt.ppc.S?r=2
Dec 21, 2011
Project Member #18 classi...@floodgap.com
Thinking it over, a better solution might be to refuse to let type inference promote Math.sqrt to sqrtDouble() unless the processor is a G5.

Tobias, are you able to make builds again? Can you make a build of 10.4Fx 9 with your gcc and see if it is able to run V8-crypto? That test does not have sqrt and consistently fails on 9 with JM+TI. The command line to run V8-crypto from the terminal after building would be

../../../obj-ff-dbg/dist/TenFourFox.app/Contents/MacOS/js -m -n -p -f run-crypto.js

(in the js/src/v8 directory)

If Crypto says it failed, then we're screwed. If Crypto can pass, then the JSStackFrame problem is a compiler bug, and it's time for gcc-4.2.
Dec 21, 2011
Project Member #19 Tobias.N...@gmail.com
First I tried the js executable that resulted from the test build with the plain Xcode-3.1 toolchain on 10.5 .
It crashes with a bus error; I attached the crash report.

I'll now update my tree and apply the latest changeset and try again, and also with gcc-4.2 .
js_run-crypto_Bus_error.log
9.2 KB   View   Download
Dec 21, 2011
Project Member #20 classi...@floodgap.com
It looks like a regalloc failure, which is expected since we didn't have our register set properly mapped in the beta.

What I'm looking at is the JSStackFrame management -- this isn't part of methodjit/, it's part of js/src/vm/. We're not properly allocating frames and since that code only gets turned on for TI, I'm suspicious we're unmasking a compiler bug. Both SPARC and MIPS can run this fine and they're both big-RISC like us.
Dec 21, 2011
Project Member #21 Tobias.N...@gmail.com
js built with gcc-4.2.1 (build 5666.3) running run-crypto.js results in
 "Crypto: Error: Crypto operation failed"

So it doesn't seem to be a compiler bug.
Dec 21, 2011
Project Member #22 classi...@floodgap.com
We're doomed.

I'm going to keep fiddling with it, but the basic idea is that nesting->activeFrames does not get updated properly in vm/Stack-inl.h. I'm not sure if this is because script()->nesting() is wrong, or script() is wrong, or what. I asked some people at Mozilla but no reply, probably because of the holidays.
Dec 21, 2011
Project Member #23 classi...@floodgap.com
This forces the update, but behaves wrongly.

diff --git a/js/src/vm/Stack-inl.h b/js/src/vm/Stack-inl.h
--- a/js/src/vm/Stack-inl.h
+++ b/js/src/vm/Stack-inl.h
@@ -437,17 +437,27 @@ StackFrame::markFunctionEpilogueDone()
     }

     /*
      * For outer/inner function frames, undo the active frame balancing so that
      * when we redo it in the epilogue we get the right final value. The other
      * nesting epilogue changes (update active args/vars) are idempotent.
      */
     if (maintainNestingState())
+{
+        types::TypeScriptNesting *n = script()->nesting();
+        fprintf(stderr, "script()->nesting() = %08x\n",
+(uint32_t)(n));
         script()->nesting()->activeFrames++;
+        n->activeFrames++;
+        fprintf(stderr, "script()->nesting() = %08x\n",
+(uint32_t)(script()->nesting()));
+fprintf(stderr, "=+ nesting->activeFrames: %i\n",
+script()->nesting()->activeFrames);
+}
 }

Dec 22, 2011
#24 jma...@gmail.com
This one may be a good generic sqrt candidate: http://en.wikipedia.org/wiki/Fast_inverse_square_root (after replacing the "return y;" with "return 1.0F/y"). You can also create a version for double using the constant 0x5fe6ec85e7de30da, as mentioned in the "History and investigation" section of that page.

Compiling that C-code with -mcpu=7450 also gives slightly different results than when using -mcpu=750 and -mcpu=7400
Dec 22, 2011
Project Member #25 classi...@floodgap.com
Tobias' testing shows that 750 and 7400 optimization differs only slightly, and the big difference is 7450. We'd probably just have a generic _VMX version that handled 7450, non _VMX for G3, and _PPC970_ that was fsqrt(). This might screw 7400 a little bit, but we already know performance is going to be poor on this short of G5 anyway. :-/

We'd have to watch constant use (we don't have a constant pool), though. With 1.0F/y, I count four.

Still, I think it would work "sufficiently" if we can solve the larger problem of JM+TI and the JSStack.
Dec 22, 2011
Project Member #26 magef...@gmail.com
I have a theory: I don't see anywhere in the VMFrame (or push()/pop()) which preserves space for the ABI Parameter area? (c.f. http://developer.apple.com/library/mac/#documentation/DeveloperTools/Conceptual/LowLevelABI/100-32-bit_PowerPC_Function_Calling_Conventions/32bitPowerPC.html)

Since stub calls take two parameters, I think they are free to clobber the 8 bytes after the linkage area? (which is VMFrame::args or the last two values push()ed)

Or am I misunderstanding the OS X ABI?
Dec 22, 2011
Project Member #27 classi...@floodgap.com
Our VMFrame is documented in js/src/methodjit/TrampolinePPCOSX.s. All parameters are in regs.

When a stub call is made, the veneer is loaded into r0 and the actual location of the call is loaded into r12. The generated code then calls JaegerStubVeneer using mtctr r0/bctrl. The trick is to keep the same VMFrame on the stack because the stub routine expects it to be located at r1, so the stub call can't make a new linkage area. Instead, it stuffs lr into veneer_lr (so that if the stub call has to throw, it can change where the link register points), leaving the existing linkage area free for the stub call to use. It then calls the stub, pulls the LR back out of veneer_lr (which may have been modified by the stub) and blrs back into execution.

JaegerStubVeneer is an intentionally non-ABI compliant routine (it is on all architectures; see the ARM and SPARC versions). However, we still have a linkage area available and it doesn't need to be preserved -- I did do a test run where the Veneer copied the linkage area into a safe zone and restored it, and there was no difference. That said, if you've found a gap, I'm all ears because I'm running out of ideas. (See my comment in  issue 120 , btw.)

Dec 22, 2011
Project Member #28 classi...@floodgap.com
I should also point out that stub calls are used in *both* JM and JM+TI, so unless there were a subtle problem, a structural issue with stub calls should cause problems in both forms of methodjit -- unless there is a stub call doing something it shouldn't be doing, but I have no evidence of that.
Dec 22, 2011
Project Member #29 classi...@floodgap.com
(clarification: *Veneer* can't make a new linkage area because the stub routine expects it to be at r1, so it leaves the existing linkage area free)
Dec 22, 2011
#30 jma...@gmail.com
The -mcpu=750 and -mcpu=7400 code is indeed identical for that routine. I just noticed one extra requirement though: we'd have to check for 0 to avoid division by 0 (or more likely a plain wrong result). And possibly also for negative numbers and throw an error in case one is passed in (I don't know the required Javascript semantics).

Apart from the 0.0, there are only 3 floating point constants though (0.5, 1.5 and 1.0). The 0x5f3759df is an integer constant and can be loaded directly into a register.
Dec 22, 2011
Project Member #31 classi...@floodgap.com
All the other sqrt routines don't have that kind of checking. I suspect it's being done on a subsequent branch32 that looks to see if there was an unordered result. For example, ARM, SPARC and x86 all just call the square root instruction directly, as does our G5 version.
Dec 22, 2011
Project Member #32 Tobias.N...@gmail.com
In case a smaller number of floating point constant would be helpful, the constants could be constructed using a single 0.5 constant:
 1.0 = 0.5 + 0.5
 1.5 = 1.0 + 0.5 (taking the resulting 1.0 from above)
 0.0 = 0.5 - 0.5 .
Dec 22, 2011
Project Member #33 Tobias.N...@gmail.com
I built js using gcc-4.6 with debugging switched on and optimization off, did a full stack backtrace of the Crypto error and attached it to this comment.

The following seems interesting to me:

#5  0x003745ec in js::mjit::Recompiler::expandInlineFrames (compartment=0x3010200, fp=0xbfffd4c0, inlined=0x2c81d8, next=0x0, f=0xbfffd43c) at /Volumes/Mac_OS_9/mozilla-beta/js/src/methodjit/Retcon.cpp:295
	codeStart = (uint8 *) 0x0
	innerpc = (jsbytecode *) 0x6b6e70ff <Address 0x6b6e70ff out of bounds>

Now I don't understand how the pointer innerpc can be out of bounds.
Dec 22, 2011
Project Member #34 Tobias.N...@gmail.com
Attached the wrong file... here comes the right one.
js_run-crypto_full_stack_backtrace.txt
20.2 KB   View   Download
Dec 22, 2011
Project Member #35 classi...@floodgap.com
Just making sure: this was with the RC changesets, right? The mozilla-beta/ threw me a bit.

I'm trying to find a simple test case that fails JM+TI but passes JM. I can find lots of complex ones but that isn't helpful :P
Dec 22, 2011
Project Member #36 Tobias.N...@gmail.com
Actually it's the following what fails:

uint8* codeStart = (uint8 *)fp->jit()->code.m_code.executableAddress();
Dec 22, 2011
Project Member #37 Tobias.N...@gmail.com
This is with release.
Dec 22, 2011
Project Member #38 classi...@floodgap.com
Where did you find that?
Dec 22, 2011
Project Member #39 classi...@floodgap.com
I'm going to start simplifying this first set of failures.

FAILURES:
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/bug532568.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/bug653153.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/bug679977.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/doMath.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/math-jit-tests.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/test586387.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/testBranchingUnstableLoop.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/testCallApply.js

Dec 22, 2011
Project Member #40 Tobias.N...@gmail.com
I did a run without my breakpoint (which I used to get a backtrace in the optimized builds) and surprisingly it did assert. Then I simply followed the advice that is embedded in the code to "return" and then "continue".
Look at the attached file. At the end you see the steps I took.
js_run-crypto_full_stack_backtrace_2.txt
23.0 KB   View   Download
Dec 22, 2011
Project Member #41 classi...@floodgap.com
Interesting, though I bet we were hosed earlier. The final line is part of repatchCall() and we know that works since JM alone uses it.

So here's as small a test case I could make that fails JM+TI, but passes JM and TM:

function x()
{
    var s = 0;
    for (var i = 0; i < 2; i++) {
        s = i+0.5;
        print(s);
    }
    return s;
}
print(x());

% ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -m -n -p -a -f xtest.js
0.5
1.5
NaN <<<<<<<<<<<<<<<<< WRONG!
% ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -m -p -a -f xtest.js
0.5
1.5
1.5
% ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -j -p -a -f xtest.js
0.5
1.5
1.5

The loop is needed; the function call is needed; the loop needs at least two iterations; the convert to double is needed. If any of these are removed, the test will pass.

You can get a dump of the instructions that are being generated by setting JMFLAGS to full and running js (warning: lots of output). I'm going to start debugging this myself, but if anyone gets to it first, holler.
Dec 22, 2011
Project Member #42 classi...@floodgap.com
*And*, this *passes*:

function x()
{
    var s = 0;
    for (var i = 0; i < 2; i++) {
        s = i+0.5;
        print(s);
    }
    print(s); ################
    return s;
}
print(x());

Dec 22, 2011
Project Member #43 Tobias.N...@gmail.com
Results are different here using the same parameters to js as you do:

0.5
1.5
0.6168502750680849

But additionally passing "-d" (methodjit debugging) to js gives the correct results!

Dec 22, 2011
Project Member #44 classi...@floodgap.com
You are correct, sir, -d does fix it.

Now I'm wondering if we have a cache coherency issue.
Dec 22, 2011
Project Member #45 classi...@floodgap.com
-d does not let us inline calls. This doesn't fix some of the other crashes, but it does fix this one. So perhaps our calls are not being patched in correctly.
Dec 22, 2011
Project Member #46 Tobias.N...@gmail.com
Regarding my last backtrace:
That last line is from js::mjit::Recompiler::expandInlineFrames (js/src/methodjit/Retcon.cpp) which appears at the top of the backtrace.
Dec 22, 2011
Project Member #47 classi...@floodgap.com
So here's what it looks like between 1.5 and [wrong answer]:

||||||||||||||>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.5

0x006ab5b0 in JaegerStubVeneer () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab5b0 <JaegerStubVeneer+16>:     lwz     r0,124(r1)
Value returned is $2 = (void *) 0x0
(gdb) si
0x006ab5b4      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab5b4 <JaegerStubVeneer+20>:     mtlr    r0
(gdb)
0x006ab5b8      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab5b8 <JaegerStubVeneer+24>:     blr
(gdb)
0x00b4bcc8 in ?? ()
1: x/i $pc  0xb4bcc8:   lwz     r13,52(r1)
(gdb)
0x00b4bccc in ?? ()
1: x/i $pc  0xb4bccc:   lis     r0,180
(gdb)
0x00b4bcd0 in ?? ()
1: x/i $pc  0xb4bcd0:   ori     r0,r0,48368
(gdb)
0x00b4bcd4 in ?? ()
1: x/i $pc  0xb4bcd4:   mtctr   r0
(gdb)
0x00b4bcd8 in ?? ()
1: x/i $pc  0xb4bcd8:   bctr
(gdb)
0x00b4bcf0 in ?? ()
1: x/i $pc  0xb4bcf0:   lwz     r7,76(r13)
(gdb)
0x00b4bcf4 in ?? ()
1: x/i $pc  0xb4bcf4:   lis     r0,180
(gdb)
0x00b4bcf8 in ?? ()
1: x/i $pc  0xb4bcf8:   ori     r0,r0,46872
(gdb)
0x00b4bcfc in ?? ()
1: x/i $pc  0xb4bcfc:   mtctr   r0
(gdb)
0x00b4bd00 in ?? ()
1: x/i $pc  0xb4bd00:   bctr
(gdb)
0x00b4b718 in ?? ()
1: x/i $pc  0xb4b718:   lwz     r23,68(r13)
(gdb)
0x00b4b71c in ?? ()
1: x/i $pc  0xb4b71c:   li      r0,-1
(gdb)
0x00b4b720 in ?? ()
1: x/i $pc  0xb4b720:   subfo.  r23,r0,r23 <<<<<<<<<<< increment loop variable with overflow check
(gdb)
0x00b4b724 in ?? ()
1: x/i $pc  0xb4b724:   lis     r0,180
(gdb)
0x00b4b728 in ?? ()
1: x/i $pc  0xb4b728:   ori     r0,r0,48540
(gdb)
0x00b4b72c in ?? ()
1: x/i $pc  0xb4b72c:   mtctr   r0
(gdb)
0x00b4b730 in ?? ()
1: x/i $pc  0xb4b730:   mfxer   r12
(gdb)
0x00b4b734 in ?? ()
1: x/i $pc  0xb4b734:   rotlwi  r12,r12,4
(gdb)
0x00b4b738 in ?? ()
1: x/i $pc  0xb4b738:   mtcrf   1,r12
(gdb)
0x00b4b73c in ?? ()
1: x/i $pc  0xb4b73c:   rlwinm  r12,r12,28,3,31
(gdb)
0x00b4b740 in ?? ()
1: x/i $pc  0xb4b740:   mtxer   r12
(gdb)
0x00b4b744 in ?? ()
1: x/i $pc  0xb4b744:   bgtctr  cr7
(gdb)
0x00b4b748 in ?? ()
1: x/i $pc  0xb4b748:   stw     r23,68(r13)
(gdb)
0x00b4b74c in ?? ()
1: x/i $pc  0xb4b74c:   mr      r22,r23
(gdb)
0x00b4b750 in ?? ()
1: x/i $pc  0xb4b750:   cmpwi   r22,2
(gdb)
0x00b4b754 in ?? ()
1: x/i $pc  0xb4b754:   lis     r0,180
(gdb)
0x00b4b758 in ?? ()
1: x/i $pc  0xb4b758:   ori     r0,r0,46472
(gdb)
0x00b4b75c in ?? ()
1: x/i $pc  0xb4b75c:   mtctr   r0
(gdb)
0x00b4b760 in ?? ()
1: x/i $pc  0xb4b760:   nop
(gdb)
0x00b4b764 in ?? ()
1: x/i $pc  0xb4b764:   nop
(gdb)
0x00b4b768 in ?? ()
1: x/i $pc  0xb4b768:   nop
(gdb)
0x00b4b76c in ?? ()
1: x/i $pc  0xb4b76c:   nop
(gdb)
0x00b4b770 in ?? ()
1: x/i $pc  0xb4b770:   nop
(gdb)
0x00b4b774 in ?? ()
1: x/i $pc  0xb4b774:   bltctr
(gdb)
0x00b4b778 in ?? ()
1: x/i $pc  0xb4b778:   lis     r0,180
(gdb) i reg r22
r22            0x2      2
(gdb) si
0x00b4b77c in ?? ()
1: x/i $pc  0xb4b77c:   ori     r0,r0,47044
(gdb)
0x00b4b780 in ?? ()
1: x/i $pc  0xb4b780:   mtctr   r0
(gdb)
0x00b4b784 in ?? ()
1: x/i $pc  0xb4b784:   bctr
(gdb)
0x00b4b7c4 in ?? ()
1: x/i $pc  0xb4b7c4:   lwz     r12,0(r13)
(gdb)
0x00b4b7c8 in ?? ()
1: x/i $pc  0xb4b7c8:   andi.   r0,r12,49152
(gdb)
0x00b4b7cc in ?? ()
1: x/i $pc  0xb4b7cc:   lis     r0,180
(gdb)
0x00b4b7d0 in ?? ()
1: x/i $pc  0xb4b7d0:   ori     r0,r0,49164
(gdb)
0x00b4b7d4 in ?? ()
1: x/i $pc  0xb4b7d4:   mtctr   r0
(gdb)
0x00b4b7d8 in ?? ()
1: x/i $pc  0xb4b7d8:   nop
(gdb)
0x00b4b7dc in ?? ()
1: x/i $pc  0xb4b7dc:   nop
(gdb)
0x00b4b7e0 in ?? ()
1: x/i $pc  0xb4b7e0:   nop
(gdb)
0x00b4b7e4 in ?? ()
1: x/i $pc  0xb4b7e4:   nop
(gdb)
0x00b4b7e8 in ?? ()
1: x/i $pc  0xb4b7e8:   nop
(gdb)
0x00b4b7ec in ?? ()
1: x/i $pc  0xb4b7ec:   bnectr
(gdb)
0x00b4b7f0 in ?? ()
1: x/i $pc  0xb4b7f0:   lfd     f7,56(r13)
(gdb)
0x00b4b7f4 in ?? ()
1: x/i $pc  0xb4b7f4:   stfd    f0,16(r1)
(gdb)
0x00b4b7f8 in ?? ()
1: x/i $pc  0xb4b7f8:   lwz     r8,16(r1)
(gdb)
0x00b4b7fc in ?? ()
1: x/i $pc  0xb4b7fc:   lwz     r7,20(r1)
(gdb)
0x00b4b800 in ?? ()
1: x/i $pc  0xb4b800:   lwz     r3,20(r13)
(gdb)
0x00b4b804 in ?? ()
1: x/i $pc  0xb4b804:   mtctr   r3
(gdb)
0x00b4b808 in ?? ()
1: x/i $pc  0xb4b808:   bctr
(gdb)
0x006ab520 in JaegerTrampolineReturn () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab520 <JaegerTrampolineReturn>:  stw     r7,28(r13)
(gdb)
0x006ab524      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab524 <JaegerTrampolineReturn+4>:        stw     r8,24(r13)
(gdb)
0x006ab528      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab528 <JaegerTrampolineReturn+8>:        mr      r3,r1
(gdb)
0x006ab52c      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab52c <JaegerTrampolineReturn+12>:       bl      0x608800 <PopActiveVMFrame>
(gdb)
PopActiveVMFrame (f=@0xbfffbd80) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/methodjit/MethodJIT.cpp:141
141     PopActiveVMFrame(VMFrame &f)
1: x/i $pc  0x608800 <PopActiveVMFrame>:        mflr    r0
(gdb) fin
Run till exit from #0  PopActiveVMFrame (f=@0xbfffbd80) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/methodjit/MethodJIT.cpp:141
0x006ab530 in JaegerTrampolineReturn () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab530 <JaegerTrampolineReturn+16>:       li      r3,1
(gdb) si
0x006ab534 in exitthejaeger () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab534 <exitthejaeger>:   lwz     r1,0(r1)
(gdb)
0x006ab538 in exitthejaeger () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab538 <exitthejaeger+4>: lwz     r0,8(r1)
(gdb)
0x006ab53c      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab53c <exitthejaeger+8>: mtlr    r0
(gdb)
0x006ab540      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab540 <exitthejaeger+12>:        lwz     r24,-8(r1)
(gdb)
0x006ab544      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab544 <exitthejaeger+16>:        lwz     r23,-12(r1)
(gdb)
0x006ab548      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab548 <exitthejaeger+20>:        lwz     r22,-16(r1)
(gdb)
0x006ab54c      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab54c <exitthejaeger+24>:        lwz     r21,-20(r1)
(gdb)
0x006ab550      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab550 <exitthejaeger+28>:        lwz     r20,-24(r1)
(gdb)
0x006ab554      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab554 <exitthejaeger+32>:        lwz     r19,-28(r1)
(gdb)
0x006ab558      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab558 <exitthejaeger+36>:        lwz     r18,-32(r1)
(gdb)
0x006ab55c      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab55c <exitthejaeger+40>:        lwz     r17,-36(r1)
(gdb)
0x006ab560      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab560 <exitthejaeger+44>:        lwz     r16,-40(r1)
(gdb)
0x006ab564      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab564 <exitthejaeger+48>:        lwz     r15,-44(r1)
(gdb)
0x006ab568      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab568 <exitthejaeger+52>:        lwz     r14,-48(r1)
(gdb)
0x006ab56c      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab56c <exitthejaeger+56>:        lwz     r13,-52(r1)
(gdb)
0x006ab570      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab570 <exitthejaeger+60>:        lwz     r0,4(r1)
(gdb)
0x006ab574      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab574 <exitthejaeger+64>:        mtcr    r0
(gdb)
0x006ab578      1246                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6ab578 <exitthejaeger+68>:        blr
(gdb)
js::mjit::EnterMethodJIT (cx=0xc04f30, fp=0x1808078, code=0xb4bf64, stackLimit=0x1be8000, partial=true) at jscntxt.h:2047
2047        ~JSAutoResolveFlags() { mContext->resolveFlags = mSaved; }
1: x/i $pc  0x2de064 <_ZN2js4mjit14EnterMethodJITEP9JSContextPNS_10StackFrameEPvPN2JS5ValueEb+660>:     lwz     r0,72(r30)
(gdb) fin
Run till exit from #0  js::mjit::EnterMethodJIT (cx=0xc04f30, fp=0x1808078, code=0xb4bf64, stackLimit=0x1be8000, partial=true) at jscntxt.h:2047
[jaeger] Prof     script run took 1110470 ms
0x002de71c in js::mjit::JaegerShotAtSafePoint (cx=0xc04f30, safePoint=0xb4bf64, partial=true) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/methodjit/MethodJIT.cpp:947
947         return EnterMethodJIT(cx, fp, code, stackLimit, partial);
1: x/i $pc  0x2de71c <_ZN2js4mjit21JaegerShotAtSafePointEP9JSContextPvb+1004>:  b       0x2de770 <_ZN2js4mjit21JaegerShotAtSafePointEP9JSContextPvb+1088>
Value returned is $3 = js::mjit::Jaeger_Returned
(gdb) si
975     }
1: x/i $pc  0x2de770 <_ZN2js4mjit21JaegerShotAtSafePointEP9JSContextPvb+1088>:  lwz     r1,0(r1)
(gdb) fin
Run till exit from #0  js::mjit::JaegerShotAtSafePoint (cx=0xc04f30, safePoint=0xb4bf64, partial=true) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/methodjit/MethodJIT.cpp:975
js::Interpret (cx=0xc04f30, entryFrame=0x1808020, interpMode=js::JSINTERP_NORMAL) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jsinterp.cpp:2204
2204            CHECK_PARTIAL_METHODJIT(status);
1: x/i $pc  0xd3d7c <_ZN2js9InterpretEP9JSContextPNS_10StackFrameENS_10InterpModeE+9804>:       cmpwi   cr7,r3,2
Value returned is $4 = js::mjit::Jaeger_Returned
(gdb) fin
Run till exit from #0  js::Interpret (cx=0xc04f30, entryFrame=0x1808020, interpMode=js::JSINTERP_NORMAL) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jsinterp.cpp:2204

NaN <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

js::mjit::EnterMethodJIT (cx=0xc04f30, fp=0x1808020, code=<value temporarily unavailable, due to optimizations>, stackLimit=0x1be8000, partial=false) at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/methodjit/MethodJIT.cpp:918
918             return ok ? Jaeger_Returned : Jaeger_Throwing;
1: x/i $pc  0x2de1d8 <_ZN2js4mjit14EnterMethodJITEP9JSContextPNS_10StackFrameEPvPN2JS5ValueEb+1032>:    cmpwi   cr7,r3,0
Value returned is $5 = true
(gdb) cont


Program exited normally.

Dec 22, 2011
Project Member #48 classi...@floodgap.com
Well, I'm an idiot:

0x00b4b7f0 in ?? ()
1: x/i $pc  0xb4b7f0:   lfd     f7,56(r13)
(gdb)
0x00b4b7f4 in ?? ()
1: x/i $pc  0xb4b7f4:   stfd    f0,16(r1)
(gdb)

That should obviously be stfd f7,16(r1). I'm looking for other errors.
Dec 22, 2011
Project Member #49 classi...@floodgap.com
Patch to breakDoubleTo32 and the test passes.

-        m_assembler.stfd(fpTempRegister, stackPointerRegister, d);
+        m_assembler.stfd(srcDest, stackPointerRegister, d);

Man, that was dumb. Now to look at some of the other failures.
Dec 22, 2011
Project Member #50 classi...@floodgap.com
var actual = "";

function f() {
  var x = 10;

  var g = function(p) {
    for (var i = 0; i < 10; ++i)
      x = p + i;
  }

  for (var i = 0; i < 10; ++i) {
    g(100 * i + x);
    actual += x + ',';
  }
}

f();

print(actual);

If run with -m -n -p -f, the assertion is

Assertion failure: nesting->activeFrames != 0, at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jsinfer.cpp:5329

which is our old friend above. But if we force JIT on *always* with -a, it passes.
Dec 22, 2011
Project Member #51 magef...@gmail.com
That test case actually seems to pass here with the fix to breakDouble32 -- as does v8/crypto. The only additional change I have is

diff --git a/js/src/assembler/assembler/MacroAssemblerPPC.h b/js/src/assembler/assembler/MacroAssemblerPPC.h
index a2fe4e6..9c1894a 100644
--- a/js/src/assembler/assembler/MacroAssemblerPPC.h
+++ b/js/src/assembler/assembler/MacroAssemblerPPC.h
@@ -1475,7 +1475,7 @@ js::JaegerSpew(js::JSpew_Insns, ISPFX "== branchMul32(Condition cond, Imm32 imm,
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== branchSub32(Condition cond, RegisterID src, RegisterID dest) ==\n");
         ASSERT((cond == Overflow) || (cond == Signed) || (cond == Zero) || (cond == NonZero));
-        m_assembler.subf_rc(dest, src, dest);
+        m_assembler.subfo_rc(dest, src, dest);
         return Jump(m_assembler.m_branch(cond));
     }


which I spotted by chance.
Dec 22, 2011
Project Member #52 classi...@floodgap.com
Even with -m -n -p -f (no -a)? What compiler are you using?

I'll take that other change too, thanks.
Dec 22, 2011
Project Member #53 magef...@gmail.com
Yup, no -a. This is a release build hacked to enable JS_METHODJIT_SPEW, though, so assertions aren't enabled. (DEBUG builds take ~8 hours on this machine, because it hits swap compiling several of the files.)

ZomBookG4:build-spew ben$ ./js -m -n -p -f testcase.js 
19,128,337,646,1055,1564,2173,2882,3691,4600,

ZomBookG4:build-spew ben$ cd ../v8
ZomBookG4:v8 ben$ ../build-spew/js -m -n -p -f run-crypto.js 
Crypto: 799
----
Score (version 6): 799

This is with gcc-4.2 (Xcode 3.1.4):

ZomBookG4:v8 ben$ gcc-4.2 --version
powerpc-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5577)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Dec 22, 2011
Project Member #54 classi...@floodgap.com
No, crypto works, but I'm hitting that assert still. I want to make sure there isn't a good reason for it before I comment it out again. Tobias, since you're also with 4.2, does the test case pass for you (with asserts on)?
Dec 22, 2011
Project Member #55 classi...@floodgap.com
Here's a "better" test case that does crash with -a.

var d = 1;
function heavy(x) {
    eval(x);
    return function lite() {
        var s = 0;
        for (var i = 0; i < 2; i++)
            s+=d;
        return s;
    };
}

print(heavy("1")());


Starting program: /Volumes/BruceDeuce/src/mozilla-9.0/obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -m -n -p -a -f test.js
Reading symbols for shared libraries ....................+++ done
Assertion failure: nesting->activeFrames != 0, at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jsinfer.cpp:5329

Dec 22, 2011
Project Member #56 classi...@floodgap.com
Like before, a strategically placed print() will allow it to pass:

var d = 1;
function heavy(x) {
    eval(x);
    return function lite() {
        var s = 0;
        for (var i = 0; i < 2; i++)
            s+=d;
        print(s);
        return s;
    };
}

print(heavy("1")());

2
2

But -d does not fix this one.
Dec 22, 2011
Project Member #57 classi...@floodgap.com
This time with a little extra output and a trap after the mtctr r24 in JaegerTrampoline:

var d = 1;
function heavy(x) {
    eval(x);
    return function lite() {
        var s = 0;
        for (var i = 0; i < 2; i++) {
            s+=d;
            print(s);
        }
        return s;
    };
}

Starting program: /Volumes/BruceDeuce/src/mozilla-9.0/obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -m -n -p -a -f test.js
Reading symbols for shared libraries ....................+++ done

Program received signal SIGTRAP, Trace/breakpoint trap.
0x006ab548 in JaegerTrampoline () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
(gdb) set $pc+=4
(gdb) cont
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x006ab548 in JaegerTrampoline () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
(gdb) set $pc+=4
(gdb) cont
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x006ab548 in JaegerTrampoline () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
(gdb) set $pc+=4
(gdb) cont
Continuing.

Program received signal SIGTRAP, Trace/breakpoint trap.
0x006ab548 in JaegerTrampoline () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
(gdb) set $pc+=4
(gdb) cont
Continuing.
1

Program received signal SIGTRAP, Trace/breakpoint trap.
0x006ab548 in JaegerTrampoline () at jstracer.h:1246
1246                checkForGlobalObjectReallocationHelper();
(gdb) set $pc+=4
(gdb) cont
Continuing.
2
Assertion failure: nesting->activeFrames != 0, at /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jsinfer.cpp:5329

Dec 22, 2011
Project Member #58 classi...@floodgap.com
I'm going to debug js::Interpret in more detail when I get back from dinner.
trace.txt
37.8 KB   View   Download
Dec 22, 2011
Project Member #59 Tobias.N...@gmail.com
In order to rebuild just js you just need to reconfigure (gmake -f client.mk configure) and then simply gmake in "{build directory}/nsprbub" and "{build directory}/js/src" . This does take far less than one hour.
Dec 22, 2011
Project Member #60 Tobias.N...@gmail.com
I'm using gcc-4.6 currently.

test case from comment 50 asserts here in the debug build but completes in the optimized build, backtrace attached. Interesting the comments in ~AutoEnterTypeInference from jsinferinlines.h which says "TODO ...".

test case from comment 57 completes under any circumstances here.

test case from comment 55 asserts with -a but not otherwise. (it's always ~AutoEnterTypeInference calling processPendingRecompiles which fails in the end)
js_run-crypto_full_stack_backtrace_3.txt
16.1 KB   View   Download
Dec 23, 2011
Project Member #61 magef...@gmail.com
Since I don't have an ASSERT-enabled build, does the following testcase assert for you, or is it a different bug:


arr = [1e0, 5e1, 9e19, 0.1e20, 1.3e20, 1e20, 9e20, 9.99e20, 0.1e21,
       1e21, 1.0e21, 2e21, 2e20, 2.1e22, 9e21, 0.1e22, 1e22, 3e46, 3e23, 3e100, 3.4e200, 7e1000,
       1e21, 1e21+65537, 1e21+65536, 1e21-65536, 1e21-65537]; 

for (var i = 0; i < 2; i++) {
    arr.push(1e19 + i*1e19);
}

print(arr);


I get total corruption of the array with -m -n -a -f, but it passes if I change the loop to 1 iteration. (This is a reduced testcase from jit-test/tests/basic/bug653153.js.)
Dec 23, 2011
Project Member #63 classi...@floodgap.com
This looks way wrong, and is the only stfd that would do this:

1: x/i $pc  0xb4e1c8:   lwz     r22,32(r23)
(gdb) 
0x00b4e1cc in ?? ()
1: x/i $pc  0xb4e1cc:   rlwinm  r12,r21,3,0,28
(gdb) 
0x00b4e1d0 in ?? ()

1: x/i $pc  0xb4e1d0:   addi    r12,r12,0
(gdb) 
0x00b4e1d4 in ?? ()
1: x/i $pc  0xb4e1d4:   stfd    f6,12(r22)
(gdb) 


That stfd probably is supposed to be r12 + r22. We should check all the store routines to make sure that we're not making this mistake in other places.
Dec 23, 2011
Project Member #64 classi...@floodgap.com
Patch to storeDouble(FPRegisterID, BaseIndex):

         checkStackPointer(address);
         if(PPC_OFFS_OK(address)) {
             m_assembler.x_slwi(addressTempRegister, address.index,
                                 address.scale);
             m_assembler.addi(addressTempRegister, addressTempRegister,
                                 address.offset);
-            m_assembler.stfd(src, address.base, addressTempRegister);
+            m_assembler.stfdx(src, address.base, addressTempRegister);
         } else {
             // Can't use r0 as a displacement, so we have to swap things up.
             m_assembler.x_slwi(tempRegister, address.index, address.scale);
             m_assembler.x_li32(addressTempRegister, address.offset);
             m_assembler.add(addressTempRegister, addressTempRegister,
                                 tempRegister);
-            m_assembler.stfd(src, address.base, addressTempRegister);
+            m_assembler.stfdx(src, address.base, addressTempRegister);

This fixes Ben's test case.
Dec 24, 2011
Project Member #65 classi...@floodgap.com
Back to the test case in comment 55. After a couple cycles of type inference, the instruction stream seems to get munged and I start seeing illogical code like this:

0x00b4cec0:     lwz     r22,20(r22)
0x00b4cec4:     and.    r0,r22,r22  <<< ???
0x00b4cec8:     lis     r0,180
0x00b4cecc:     ori     r0,r0,53064
0x00b4ced0:     mtctr   r0


1: x/i $pc  0xb4c4cc:   lis     r12,192
(gdb) 
0x00b4c4d0 in ?? ()
1: x/i $pc  0xb4c4d0:   ori     r12,r12,38200
(gdb) 
0x00b4c4d4 in ?? ()
1: x/i $pc  0xb4c4d4:   lwz     r0,0(r12)
(gdb) 
0x00b4c4d8 in ?? ()
1: x/i $pc  0xb4c4d8:   li      r0,-1 <<< huh?

The second one alarms me more because this implies that r0 is computed and then thrown away instead of stored somewhere.
Dec 24, 2011
Project Member #66 classi...@floodgap.com
I'm thinking the assertion is due to the JIT code corrupting the JSStackFrame. If it was always due to the nested structured nature, then it should do it with or without the print().
Dec 24, 2011
Project Member #67 classi...@floodgap.com
This looks like the generated code in the second example (mnemonics may not be trustworthy):

[jaeger] Insns            == sub32(TrustedImm32 imm, AbsoluteAddress address) ==
[jaeger] Insns            == load32(void* address, RegisterID dest) ==
[jaeger] Insns            lis r12,192 (0xc00000)
[jaeger] Insns            ori r12,r12,38200 (0x9538)
[jaeger] Insns            lwz r0,0(r12)
[jaeger] Insns            == sub32(TrustedImm32 imm, RegisterID dest) ==
[jaeger] Insns            subi r0,r0,1

Dec 24, 2011
Project Member #68 classi...@floodgap.com
So, the trick here is that we used r0 in an addi, and addi to r0 is turned into li. We're going to need to audit for that also.
Dec 24, 2011
Project Member #69 magef...@gmail.com
Yup, good catch! add32(TrustedImm32 imm, [Absolute]Address address) are both buggy in that they load the Address to r0 and then try to add an immediate to it -- which doesn't work.
Dec 24, 2011
Project Member #70 classi...@floodgap.com
This repairs the assert and this test case, and does not regress methodjit alone:

     void add32(TrustedImm32 imm, RegisterID src, RegisterID dest)
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== add32(TrustedImm32 imm, RegisterID sr
c, RegisterID dest) ==\n");
-        if (PPC_IMM_OK_S(imm))
+        if (PPC_IMM_OK_S(imm) && src != tempRegister) {
             m_assembler.addi(dest, src, int16_t(imm.m_value & 0xffff));
-        else {
-            move(imm, tempRegister);
-            m_assembler.add(dest, src, tempRegister);
+        } else {
+            ASSERT(src != addressTempRegister);
+            move(imm, addressTempRegister);
+            m_assembler.add(dest, src, addressTempRegister);
         }
     }
 
     void add32(TrustedImm32 imm, Address address)
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== add32(TrustedImm32 imm, Address addre
ss) ==\n");
-        load32(address, tempRegister);
+        load32(address, addressTempRegister);
         // We do this so often a macro might be nice one day.
         if (PPC_IMM_OK_S(imm))
-            m_assembler.addi(tempRegister, tempRegister, int16_t(imm.m_value & 
0xffff));
+            m_assembler.addi(addressTempRegister, addressTempRegister, int16_t(
imm.m_value & 0xffff));
         else {
-            move(imm, addressTempRegister);
-            m_assembler.add(tempRegister, tempRegister, addressTempRegister);
+            move(imm, tempRegister);
+            m_assembler.add(addressTempRegister, tempRegister, addressTempRegis
ter);
         }
-        store32(tempRegister, address);
+        store32(addressTempRegister, address);
     }
 
     void add32(Address src, RegisterID dest)
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== add32(Address src, RegisterID dest) =
=\n");
         load32(src, tempRegister);
         add32(tempRegister, dest);
     }
 
     void add32(TrustedImm32 imm, AbsoluteAddress address)
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== add32(TrustedImm32 imm, AbsoluteAddre
ss address) ==\n");
-        load32(address.m_ptr, tempRegister);
+        load32(address.m_ptr, addressTempRegister);
         if (PPC_IMM_OK_S(imm))
-            m_assembler.addi(tempRegister, tempRegister, int16_t(imm.m_value & 
0xffff));
+            m_assembler.addi(addressTempRegister, addressTempRegister, int16_t(
imm.m_value & 0xffff));
         else {
-            move(imm, addressTempRegister);
-            m_assembler.add(tempRegister, tempRegister, addressTempRegister);
+            move(imm, tempRegister);
+            m_assembler.add(addressTempRegister, tempRegister, addressTempRegis
ter);
         }
-        store32(tempRegister, address.m_ptr);
+        store32(addressTempRegister, address.m_ptr);
     }

and

     void sub32(TrustedImm32 imm, RegisterID dest)
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== sub32(TrustedImm32 imm, RegisterID de
st) ==\n");
         // We have no reg - imm in PPC, only subfic, which is imm - reg.
         // However, we can emulate it with addi.
-        if(PPC_IMM_OK_U(imm))
+        if(PPC_IMM_OK_U(imm) && dest != tempRegister) {
             m_assembler.x_subi(dest, dest, uint16_t(imm.m_value & 0xffff)); // 
addi operand order
-        else {
-            move(imm, tempRegister);
-            m_assembler.subf(dest, tempRegister, dest);
+        } else {
+            ASSERT(dest != addressTempRegister);
+            move(imm, addressTempRegister);
+            m_assembler.subf(dest, addressTempRegister, dest);
         }
     }

Basically this is forcing everything into addressTempRegister if we use an immediate instruction. I think I got them all, let me know if this looks wrong to anybody.
Dec 24, 2011
Project Member #71 classi...@floodgap.com
That got it, it's passing all the previously failed tests. Setting up a full test run.
Dec 24, 2011
Project Member #72 classi...@floodgap.com
No more asserts, but we have three failure states left:

[1772|   3|1775] 100% ===============================================>|  274.3s
FAILURES:
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/testTypedArrayOutOfBounds.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/basic/testTypedArrays.js
    -m -n -p /Volumes/BruceDeuce/src/mozilla-9.0/js/src/jit-test/tests/sunspider/check-math-partial-sums.js

Dec 24, 2011
Project Member #73 classi...@floodgap.com
Distilled from sunspider/check-math-partial-sums.js:

for(var i=3; i<5; i++) {
        print(Math.pow(i, -0.5));
}

% ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -j -a -p -f test.js
0.5773502691896258
0.5
% ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js -m -n -a -p -f test.js
0.5773502691896258
NaN

Dec 24, 2011
Project Member #74 classi...@floodgap.com
It's zeroDouble. I think we should just make that a constant load instead of trying to be funny with fsub on itself.
Dec 24, 2011
Project Member #75 magef...@gmail.com
Maybe a candidate for a constant register?

The basic/testTypedArrayOutOfBounds.js test can be simplified to

var a = new Int32Array(50);
for (var i = 0; i < 100; i++) {
  assertEq(a.length, 50);
}
Dec 24, 2011
Project Member #76 magef...@gmail.com
And I think the typedarray.length problem is actually a bug in methodjit/Compiler.cpp that little-endian architectures happen to escape because for them NunboxAssembler::PAYLOAD_OFFSET is 0. The following fixes it:

diff --git a/js/src/methodjit/Compiler.cpp b/js/src/methodjit/Compiler.cpp
index b0393ce..827644d 100644
--- a/js/src/methodjit/Compiler.cpp
+++ b/js/src/methodjit/Compiler.cpp
@@ -4318,7 +4318,7 @@ mjit::Compiler::jsop_getprop(JSAtom *atom, JSValueType knownType,
             }
             RegisterID reg = frame.copyDataIntoReg(top);
             frame.pop();
-            masm.loadPayload(Address(reg, TypedArray::lengthOffset()), reg);
+            masm.load32(Address(reg, TypedArray::lengthOffset()), reg);
             frame.pushTypedPayload(JSVAL_TYPE_INT32, reg);
             if (!isObject)
                 stubcc.rejoin(Changes(1));

Dec 24, 2011
Project Member #77 classi...@floodgap.com
Maybe. It could be constructed off a zero-constant int register, though, since it's just 64 0-bits to represent 0.
Dec 24, 2011
Project Member #78 classi...@floodgap.com
W/r/t your Compiler.cpp fix, we probably want to actually take bug 697014.
Dec 24, 2011
Project Member #79 magef...@gmail.com
Yup, I agree.
Dec 24, 2011
Project Member #80 classi...@floodgap.com
     void zeroDouble(FPRegisterID srcDest)
     {
 js::JaegerSpew(js::JSpew_Insns, ISPFX "== zeroDouble(FPRegisterID srcDest) ==\n
");
-        // This seems the simplest way. TODO: Find a better one.
-        m_assembler.fsub(srcDest, srcDest, srcDest);
-        m_assembler.fabs(srcDest, srcDest); // account for -0 (not needed?)
+        // FPreg zero is just 64 zero bits, so we dump 64 zero bits in the
+        // linkage area.
+
+        const int d = 16; // Use scratch in outgoing linkage area.
+
+        m_assembler.xor_(tempRegister, tempRegister, tempRegister);
+        m_assembler.stw(tempRegister, stackPointerRegister, d);
+        m_assembler.stw(tempRegister, stackPointerRegister, d+4);
+#ifdef _PPC970_
+        // G5 and POWER4+ do better if the lfd and the stws aren't in the
+        // same dispatch group.
+        m_assembler.x_nop();
+#endif
+        m_assembler.lfd(srcDest, stackPointerRegister, d);
     }

This fixes the SunSpider failure. Now to look at your Compiler.cpp patch.
Dec 24, 2011
Project Member #81 classi...@floodgap.com
% ./jit_test.py --jitflags=mp ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js
[1775|   0|1775] 100% ===============================================>|  257.0s
PASSED ALL
% ./jit_test.py --jitflags=mnp ../../../obj-ff-dbg/dist/TenFourFoxDebug.app/Contents/MacOS/js
[1775|   0|1775] 100% ===============================================>|  270.0s
PASSED ALL

Merry Christmas to us!
The last step is to figure out what we're going to do about sqrt for G3/G4. I'm thinking a first pass is simply to prevent the engine from optimizing the Math.sqrt stub call.
Dec 24, 2011
Project Member #82 magef...@gmail.com
Hurrah!

There's the Math.sqrt stub, but there are also the Math.pow(x, +/-0.5) calls (in methodjit/Compiler.cpp) which get optimized down to a square root as well. Can we do something clever with frsqrte, which IIUC the G3 and G4 do have? (I.e. something like

frsqrte dest, src
fmul dest, dest, src
/* and then a few (3?) rounds of Newton's method */

)
Dec 24, 2011
Project Member #83 classi...@floodgap.com
Here's such a routine:

http://www.cygwin.com/ml/libc-alpha/2006-12/msg00116.html

However, this *should* let us get off the ground -- let me know if it works for you. It just short circuits the optimizer in FastBuiltins.cpp.

diff --git a/js/src/methodjit/FastBuiltins.cpp b/js/src/methodjit/FastBuiltins.cpp
--- a/js/src/methodjit/FastBuiltins.cpp
+++ b/js/src/methodjit/FastBuiltins.cpp
@@ -287,26 +287,28 @@ mjit::Compiler::compileMathPowSimple(Fra
     stubcc.linkExit(isNegInfinity, Uses(4));
 
     /* Convert -0 to +0. */
     masm.zeroDouble(fpResultReg);
     masm.moveDouble(fpReg, fpScratchReg);
     masm.addDouble(fpResultReg, fpScratchReg);
 
     double y = arg2->getValue().toDouble();
+#ifdef _PPC970_
     if (y == 0.5) {
         /* pow(x, 0.5) => sqrt(x) */
         masm.sqrtDouble(fpScratchReg, fpResultReg);
 
     } else if (y == -0.5) {
         /* pow(x, -0.5) => 1/sqrt(x) */
         masm.sqrtDouble(fpScratchReg, fpScratchReg);
         masm.slowLoadConstantDouble(1, fpResultReg);
         masm.divDouble(fpScratchReg, fpResultReg);
     }
+#endif
 
     frame.freeReg(fpScratchReg);
 
     if (allocate)
         frame.freeReg(fpReg);
 
     stubcc.leave();
     stubcc.masm.move(Imm32(2), Registers::ArgReg1);
@@ -707,20 +709,22 @@ mjit::Compiler::inlineNativeFunction(uin
         if (native == js_math_floor && argType == JSVAL_TYPE_DOUBLE &&
             type == JSVAL_TYPE_INT32) {
             return compileRound(arg, Floor);
         }
         if (native == js_math_round && argType == JSVAL_TYPE_DOUBLE &&
             type == JSVAL_TYPE_INT32) {
             return compileRound(arg, Round);
         }
+#ifdef _PPC970_
         if (native == js_math_sqrt && type == JSVAL_TYPE_DOUBLE &&
             (argType == JSVAL_TYPE_INT32 || argType == JSVAL_TYPE_DOUBLE)) {
             return compileMathSqrt(arg);
         }
+#endif
         if (native == js_str_charCodeAt && argType == JSVAL_TYPE_INT32 &&
             thisType == JSVAL_TYPE_STRING && type == JSVAL_TYPE_INT32) {
             return compileGetChar(thisValue, arg, GetCharCode);
         }
         if (native == js_str_charAt && argType == JSVAL_TYPE_INT32 &&
             thisType == JSVAL_TYPE_STRING && type == JSVAL_TYPE_STRING) {
             return compileGetChar(thisValue, arg, GetChar);
         }

Dec 24, 2011
Project Member #84 classi...@floodgap.com
Actually, I take that back, it should be

diff --git a/js/src/methodjit/FastBuiltins.cpp b/js/src/methodjit/FastBuiltins.cpp
--- a/js/src/methodjit/FastBuiltins.cpp
+++ b/js/src/methodjit/FastBuiltins.cpp
@@ -707,20 +707,22 @@ mjit::Compiler::inlineNativeFunction(uin
         if (native == js_math_floor && argType == JSVAL_TYPE_DOUBLE &&
             type == JSVAL_TYPE_INT32) {
             return compileRound(arg, Floor);
         }
         if (native == js_math_round && argType == JSVAL_TYPE_DOUBLE &&
             type == JSVAL_TYPE_INT32) {
             return compileRound(arg, Round);
         }
+#ifdef _PPC970_
         if (native == js_math_sqrt && type == JSVAL_TYPE_DOUBLE &&
             (argType == JSVAL_TYPE_INT32 || argType == JSVAL_TYPE_DOUBLE)) {
             return compileMathSqrt(arg);
         }
+#endif
         if (native == js_str_charCodeAt && argType == JSVAL_TYPE_INT32 &&
             thisType == JSVAL_TYPE_STRING && type == JSVAL_TYPE_INT32) {
             return compileGetChar(thisValue, arg, GetCharCode);
         }
         if (native == js_str_charAt && argType == JSVAL_TYPE_INT32 &&
             thisType == JSVAL_TYPE_STRING && type == JSVAL_TYPE_STRING) {
             return compileGetChar(thisValue, arg, GetChar);
         }
@@ -742,18 +744,20 @@ mjit::Compiler::inlineNativeFunction(uin
         JSValueType arg1Type = arg1->isTypeKnown() ? arg1->getKnownType() : JSVAL_TYPE_UNKNOWN;
         JSValueType arg2Type = arg2->isTypeKnown() ? arg2->getKnownType() : JSVAL_TYPE_UNKNOWN;
 
         if (native == js_math_pow && type == JSVAL_TYPE_DOUBLE &&
             (arg1Type == JSVAL_TYPE_DOUBLE || arg1Type == JSVAL_TYPE_INT32) &&
             arg2Type == JSVAL_TYPE_DOUBLE && arg2->isConstant())
         {
             Value arg2Value = arg2->getValue();
+#ifdef _PPC970_
             if (arg2Value.toDouble() == -0.5 || arg2Value.toDouble() == 0.5)
                 return compileMathPowSimple(arg1, arg2);
+#endif
         }
         if ((native == js_math_min || native == js_math_max)) {
             if (arg1Type == JSVAL_TYPE_INT32 && arg2Type == JSVAL_TYPE_INT32 &&
                 type == JSVAL_TYPE_INT32) {
                 return compileMathMinMaxInt(arg1, arg2, 
                         native == js_math_min ? Assembler::LessThan : Assembler::GreaterThan);
             }
             if ((arg1Type == JSVAL_TYPE_INT32 || arg1Type == JSVAL_TYPE_DOUBLE) &&

Dec 24, 2011
Project Member #85 classi...@floodgap.com
In fact, better still might be to just have a masm.supportsFloatingPointSqrt() rather than conditional compiles. Mozilla might take such a patch for other arches.
Dec 26, 2011
Project Member #87 classi...@floodgap.com
(No comment was entered for this change.)
Blocking: 119
Dec 26, 2011
Project Member #88 classi...@floodgap.com
JM+TI is operational on my G5 test build. Browser stands up, bows, asks for money. Benchmarks: SS 1050 (same as tracejit), Dromaeo 134 r/s (up from 122). Changeset attached; see if it works good for you (about:config, turn off tracejit for both chrome and content, turn on methodjit for both chrome and content, turn on type inference, restart browser). I'm going to run off a 7450 build next.
79035
16.3 KB   View   Download
Dec 26, 2011
Project Member #89 classi...@floodgap.com
(No comment was entered for this change.)
Status: Started
Dec 26, 2011
Project Member #90 classi...@floodgap.com
Tested on 1GHz G4. Sunspider 2750, Dromaeo 38 r/s. No regression there either from TM. We're going to build the pre's.
Jan 2, 2012
Project Member #91 classi...@floodgap.com
10.4Fx 10 mostly works with JM+TI except that our typed array tests started failing again.
Jan 2, 2012
Project Member #92 classi...@floodgap.com
... and the reason is because M697014 is in Fx10, so I backed out Ben's patch in comment 76 and all tests pass.

I think we can close this bug shortly unless there are any other known issues apart from optimization.
Jan 2, 2012
Project Member #93 classi...@floodgap.com
Actually, we still have some failures, under gmake check jstests:

FAILURES:
    -a -m /Volumes/BruceDeuce/src/mozilla-10b/js/src/jit-test/tests/basic/testCrossCompartmentTransparency.js
    -a -m /Volumes/BruceDeuce/src/mozilla-10b/js/src/jit-test/tests/jaeger/recompile/bug659639.js

(-j fasta is known to fail)

The first test asserts with p >= slotsBegin(), at /Volumes/BruceDeuce/src/mozilla-10b/js/src/vm/Stack.cpp:283 (it also fails in 9.0). It does pass with type inference and jit debugging.

The second test also asserts but with spoff == js_ReconstructStackDepth(cx_, fp_->script(), pc_), at /Volumes/BruceDeuce/src/mozilla-10b/js/src/vm/Stack.cpp:1051. It also passes with type inference and jit debugging. It does pass 9.0 in opt builds, but I don't have a debug build of 9 handy, so it may well assert there too.


Jan 2, 2012
Project Member #94 classi...@floodgap.com
This is the simplified version of testCrossComparmentTransparency, although it crashes instead of asserts.

var g1 = newGlobal('same-compartment');
var g2 = newGlobal('new-compartment');

function test(str, f) {
    f(g1.eval(str));
    f(g2.eval(str));
    f(g1.eval("new Object()"));
}

test("new Boolean(true)", function(b) { Boolean.prototype.toString.call(b); });

It does not crash 9.
Jan 2, 2012
Project Member #95 classi...@floodgap.com
And the simplified bug659639:

function iso(d) {
new Date(d).toISOString()
}
function dd(x) { }

        iso(20092353211)
        iso(2009)
        iso(0)
        iso(dd("10278"))


We'll start with this one first since it's easier to debug.
Jan 2, 2012
Project Member #96 classi...@floodgap.com
The crash is actually within the stub (at the point where it throws).

1: x/i $pc  0xb6b5d8:   li      r0,6
(gdb) 
0x00b6b5dc in ?? ()
1: x/i $pc  0xb6b5dc:   stw     r0,72(r1)
(gdb) 
0x00b6b5e0 in ?? ()
1: x/i $pc  0xb6b5e0:   addi    r10,r13,72
(gdb) 
0x00b6b5e4 in ?? ()
1: x/i $pc  0xb6b5e4:   stw     r10,40(r1)
(gdb) 
0x00b6b5e8 in ?? ()
1: x/i $pc  0xb6b5e8:   stw     r13,52(r1)
(gdb) 
0x00b6b5ec in ?? ()
1: x/i $pc  0xb6b5ec:   lis     r0,192
(gdb) 
0x00b6b5f0 in ?? ()
1: x/i $pc  0xb6b5f0:   ori     r0,r0,36277
(gdb) 
0x00b6b5f4 in ?? ()
1: x/i $pc  0xb6b5f4:   stw     r0,44(r1)
(gdb) 
0x00b6b5f8 in ?? ()
1: x/i $pc  0xb6b5f8:   lwz     r3,56(r1)
(gdb) 
0x00b6b5fc in ?? ()
1: x/i $pc  0xb6b5fc:   addi    r5,r13,56
(gdb) 
0x00b6b600 in ?? ()
1: x/i $pc  0xb6b600:   li      r4,0
(gdb) 
0x00b6b604 in ?? ()
1: x/i $pc  0xb6b604:   lis     r12,72
(gdb) 
0x00b6b608 in ?? ()
1: x/i $pc  0xb6b608:   ori     r12,r12,63328
(gdb) 
0x00b6b60c in ?? ()
1: x/i $pc  0xb6b60c:   lis     r0,106
(gdb) 
0x00b6b610 in ?? ()
1: x/i $pc  0xb6b610:   ori     r0,r0,43520
(gdb) 
0x00b6b614 in ?? ()
1: x/i $pc  0xb6b614:   mtctr   r0
(gdb) 
0x00b6b618 in ?? ()
1: x/i $pc  0xb6b618:   bctrl
(gdb) 
0x006aaa00 in JaegerStubVeneer () at jstracer.h:1245
1245                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6aaa00 <JaegerStubVeneer>:        mflr    r0
(gdb) 
0x006aaa04      1245                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6aaa04 <JaegerStubVeneer+4>:      stw     r0,124(r1)
(gdb) 
0x006aaa08      1245                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6aaa08 <JaegerStubVeneer+8>:      mtctr   r12
(gdb) 
0x006aaa0c      1245                checkForGlobalObjectReallocationHelper();
1: x/i $pc  0x6aaa0c <JaegerStubVeneer+12>:     bctrl
(gdb) 
date_toISOString (cx=0xc04f90, argc=0, vp=0x18080a8) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsdate.cpp:2121
2121    date_toISOString(JSContext *cx, uintN argc, Value *vp)
1: x/i $pc  0x48f760 <date_toISOString>:        mflr    r0
(gdb) fin
Run till exit from #0  date_toISOString (cx=0xc04f90, argc=0, vp=0x18080a8) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsdate.cpp:2121
Assertion failure: spoff == js_ReconstructStackDepth(cx_, fp_->script(), pc_), at /Volumes/BruceDeuce/src/mozilla-10b/js/src/vm/Stack.cpp:1051

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00000000
CrashInJS () at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsutil.cpp:92
92          *((volatile int *) NULL) = 123;  /* To continue from here in GDB: "return" then "continue". */
1: x/i $pc  0x157624 <CrashInJS+20>:    stw     r0,0(r2)
(gdb) bt
#0  CrashInJS () at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsutil.cpp:92
#1  0x001943f4 in js::StackIter::settleOnNewState (this=0xbfffec18) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/vm/Stack.cpp:1050
#2  0x0002c314 in PopulateReportBlame (cx=0xc04f90, report=0xbfffecac) at vm/Stack.h:1570
#3  0x0002dcc8 in js_ReportErrorNumberVA (cx=0xc04f90, flags=0, callback=0x29470 <js_GetErrorMessage(void*, char const*, unsigned int)>, userRef=0x0, errorNumber=24, charArgs=1, ap=0xbfffed98 "") at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jscntxt.cpp:981
#4  0x00010230 in JS_ReportErrorNumber (cx=<value temporarily unavailable, due to optimizations>, errorCallback=<value temporarily unavailable, due to optimizations>, userRef=<value temporarily unavailable, due to optimizations>, errorNumber=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsapi.cpp:5891
#5  0x0048f6d4 in date_utc_format (cx=0xc04f90, native=0x48f760 <date_toISOString>, args={<js::CallReceiver> = {usedRval_ = false, argv_ = 0x18080b8}, argc_ = 0}, printFunc=0x487f60 <print_iso_string>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsdate.cpp:2098
#6  0x006aaa10 in JaegerStubVeneer () at jstracer.h:1245
#7  0x002dc874 in js::mjit::EnterMethodJIT (cx=0xc04f90, fp=0xf1b150, code=0xffffff87, stackLimit=0xb43794, partial=false) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/methodjit/MethodJIT.cpp:1066
#8  0x002de38c in js::mjit::JaegerShot (cx=0xc04f90, partial=false) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/methodjit/MethodJIT.cpp:1127
#9  0x000e07ec in js::RunScript (cx=0xc04f90, script=0xf06258, fp=0x1808020) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsinterp.cpp:581
#10 0x000e1254 in js::ExecuteKernel (cx=0xc04f90, script=<value temporarily unavailable, due to optimizations>, scopeChain=@0xf00040, thisv=@0xbffff208, type=<value temporarily unavailable, due to optimizations>, evalInFrame=<value temporarily unavailable, due to optimizations>, result=0x0) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsinterp.cpp:783
#11 0x004b531c in js::Execute (cx=0xc04f90, script=0xf06258, scopeChainArg=<value temporarily unavailable, due to optimizations>, rval=0x0) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsinterp.cpp:822
#12 0x00441030 in JS_ExecuteScript (cx=0xc04f90, obj=0xf00040, script=0xf06258, rval=0x0) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsapi.cpp:5060
#13 0x0040270c in Process (cx=0xc04f90, obj=0xf00040, filename=0xbffff83c "test2.js", forceTTY=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/shell/js.cpp:488
#14 0x0040343c in Shell (cx=0xc04f90, op=0xbffff55c, envp=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/shell/js.cpp:5113
#15 0x00403d90 in main (argc=4, argv=0xbffff6fc, envp=0xbffff710) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/shell/js.cpp:5419
(gdb)

cx and vp look correct (they were used in other calls uneventfully), so let's look at argc (r4).
Jan 2, 2012
Project Member #97 classi...@floodgap.com
(gdb) bt full
#0  CrashInJS () at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsutil.cpp:92
No locals.
#1  0x001943f4 in js::StackIter::settleOnNewState (this=0xbfffec18) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/vm/Stack.cpp:1050
        argc = 0
        vp = <value temporarily unavailable, due to optimizations>
        containsFrame = <value temporarily unavailable, due to optimizations>
        containsCall = <value temporarily unavailable, due to optimizations>
#2  0x0002c314 in PopulateReportBlame (cx=0xc04f90, report=0xbfffecac) at vm/Stack.h:1570
        iter = {
  iter_ = {
    cx_ = 0xc04f90, 
    savedOption_ = js::StackIter::STOP_AT_SAVED, 
    state_ = 25198624, 
    fp_ = 0x1808070, 
    calls_ = 0x0, 
    seg_ = 0x1808000, 
    sp_ = 0x0, 
    pc_ = 0xc08db5 ":", 
    args_ = {
      <js::CallReceiver> = {
        usedRval_ = 1288484, 
        argv_ = 0x0
      }, 
      members of js::CallArgs: 
      argc_ = 0
    }
  }
}
#3  0x0002dcc8 in js_ReportErrorNumberVA (cx=0xc04f90, flags=0, callback=0x29470 <js_GetErrorMessage(void*, char const*, unsigned int)>, userRef=0x0, errorNumber=24, charArgs=1, ap=0xbfffed98 "") at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jscntxt.cpp:981
        report = {
  filename = 0x0, 
  lineno = 0, 
  linebuf = 0x0, 
  tokenptr = 0x0, 
  uclinebuf = 0x0, 
  uctokenptr = 0x0, 
  flags = 0, 
  errorNumber = 24, 
  ucmessage = 0x0, 
  messageArgs = 0x0
}
        message = 0x48a110 "|w\033x?W"
        warning = 0
#4  0x00010230 in JS_ReportErrorNumber (cx=<value temporarily unavailable, due to optimizations>, errorCallback=<value temporarily unavailable, due to optimizations>, userRef=<value temporarily unavailable, due to optimizations>, errorNumber=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsapi.cpp:5891
        ap = 0xbfffed98 ""
#5  0x0048f6d4 in date_utc_format (cx=0xc04f90, native=0x48f760 <date_toISOString>, args={<js::CallReceiver> = {usedRval_ = false, argv_ = 0x18080b8}, argc_ = 0}, printFunc=0x487f60 <print_iso_string>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsdate.cpp:2098
        utctime = <value temporarily unavailable, due to optimizations>
        buf = "\000\000\000\000????\000?r?????\000?r?\000?r@\000?r@\000???\025\020??????\000?0???p\001??\000?\000\000\000\000\000\000???D\002$$?\000\000\000\000\000\000???", '\0' <repeats 12 times>, "????\020"
        str = <value temporarily unavailable, due to optimizations>
#6  0x006aaa10 in JaegerStubVeneer () at jstracer.h:1245
No symbol table info available.
#7  0x002dc874 in js::mjit::EnterMethodJIT (cx=0xc04f90, fp=0xf1b150, code=0xffffff87, stackLimit=0xb43794, partial=false) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/methodjit/MethodJIT.cpp:1066
        pcc = {
  cx = 0xc04f90, 
  oldCompartment = 0x2009c00, 
  _mCheckNotUsedAsTemporary = {
    mStatementDone = true
  }
}
        rf = {
  mContext = 0xc04f90, 
  mSaved = 0, 
  _mCheckNotUsedAsTemporary = {
    mStatementDone = true
  }
}
#8  0x002de38c in js::mjit::JaegerShot (cx=0xc04f90, partial=false) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/methodjit/MethodJIT.cpp:1127

Jan 2, 2012
Project Member #98 classi...@floodgap.com
[jaeger] Insns            == store32(TrustedImm32 imm, ImplicitAddress address) 
==
[jaeger] Insns            li r0,6 (0x6)
[jaeger] Insns            stw r0,72(sp)
[jaeger] Insns            == add32(TrustedImm32 imm, RegisterID src, RegisterID 
dest) ==
[jaeger] Insns            addi r10,JSFP,72 (0x48)
[jaeger] Insns            == store32(RegisterID src, ImplicitAddress address) ==
[jaeger] Insns            stw r10,40(sp)
[jaeger] Insns            == store32(RegisterID src, ImplicitAddress address) ==
[jaeger] Insns            stw JSFP,52(sp)
[jaeger] Insns            == store32(TrustedImm32 imm, ImplicitAddress address) 
==
[jaeger] Insns            lis r0,224 (0xe00000)
[jaeger] Insns            ori r0,r0,36277 (0x8db5)
[jaeger] Insns            stw r0,44(sp)
[jaeger] Insns            == load32(ImplicitAddress address, RegisterID dest) ==
[jaeger] Insns            lwz r3,56(sp)
[jaeger] Insns            == add32(TrustedImm32 imm, RegisterID src, RegisterID 
dest) ==
[jaeger] Insns            addi r5,JSFP,56 (0x38)
[jaeger] Insns            == move(TrustedImmPtr imm, RegisterID dest) ==
[jaeger] Insns            == move(TrustedImm32 imm, RegisterID dest) ==
[jaeger] Insns            li r4,0 (0x0)
[jaeger] Insns            == moveWithPatch(TrustedImm32 imm, RegisterID dest) ==
[jaeger] Insns            #label     ((92))
[jaeger] Insns            lis r12,72 (0x480000)
[jaeger] Insns            ori r12,r12,63328 (0xf760)
[jaeger] Insns            == call() ==
[jaeger] Insns            nop
[jaeger] Insns            trap
[jaeger] Insns            mtspr ctr, r0
[jaeger] Insns            bctrl

Jan 3, 2012
Project Member #99 classi...@floodgap.com
The test case in comment 94 also breaks down the same way, so we have a problem with throwing from inside of a nested construct. Regular throwing works, so there must be something just pathologic enough about this.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xffffff97
js::mjit::ExpandInlineFrames (compartment=0x2009c00) at vm/Stack.h:494
494             return prev_;
1: x/i $pc  0x3b65e0 <_ZN2js4mjit18ExpandInlineFramesEP13JSCompartment+96>:     lwz     r26,16(r2)
(gdb) bt
#0  js::mjit::ExpandInlineFrames (compartment=0x2009c00) at vm/Stack.h:494
#1  0x00194abc in js::StackIter::StackIter (this=0xbfffec48, cx=0xc04f90, savedOption=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/vm/Stack.cpp:1097
#2  0x0002c314 in PopulateReportBlame (cx=0xc04f90, report=0xbfffecdc) at vm/Stack.h:1570
#3  0x0002dcc8 in js_ReportErrorNumberVA (cx=0xc04f90, flags=0, callback=0x29470 <js_GetErrorMessage(void*, char const*, unsigned int)>, userRef=0x0, errorNumber=8, charArgs=1, ap=0xbfffedc8 "") at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jscntxt.cpp:981
#4  0x00010230 in JS_ReportErrorNumber (cx=<value temporarily unavailable, due to optimizations>, errorCallback=<value temporarily unavailable, due to optimizations>, userRef=<value temporarily unavailable, due to optimizations>, errorNumber=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsapi.cpp:5891
#5  0x004e0a20 in js::ReportIncompatibleMethod (cx=0xc04f90, call={usedRval_ = false, argv_ = 0x1808118}, clasp=0x716ec8) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsobj.cpp:7095
#6  0x004e0bfc in js::HandleNonGenericMethodClassMismatch (cx=0xc04f90, args={<js::CallReceiver> = {usedRval_ = false, argv_ = 0x1808118}, argc_ = 0}, native=0x46e290 <bool_toString>, clasp=0x716ec8) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsobj.cpp:7109
#7  0x006aaa10 in JaegerStubVeneer () at jstracer.h:1245
#8  0x002dc874 in js::mjit::EnterMethodJIT (cx=0xc04f90, fp=0xf22880, code=0xffffff87, stackLimit=0xf05a10, partial=4294967175) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/methodjit/MethodJIT.cpp:1066
#9  0x002de38c in js::mjit::JaegerShot (cx=0xc04f90, partial=false) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/methodjit/MethodJIT.cpp:1127
#10 0x000e07ec in js::RunScript (cx=0xc04f90, script=0xf06258, fp=0x1808020) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsinterp.cpp:581
#11 0x000e1254 in js::ExecuteKernel (cx=0xc04f90, script=<value temporarily unavailable, due to optimizations>, scopeChain=@0xf00040, thisv=@0xbffff268, type=<value temporarily unavailable, due to optimizations>, evalInFrame=<value temporarily unavailable, due to optimizations>, result=0x0) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsinterp.cpp:783
#12 0x004b531c in js::Execute (cx=0xc04f90, script=0xf06258, scopeChainArg=<value temporarily unavailable, due to optimizations>, rval=0x0) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsinterp.cpp:822
#13 0x00441030 in JS_ExecuteScript (cx=0xc04f90, obj=0xf00040, script=0xf06258, rval=0x0) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/jsapi.cpp:5060
#14 0x0040270c in Process (cx=0xc04f90, obj=0xf100d0, filename=0xbffff89c "test.js", forceTTY=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/shell/js.cpp:488
#15 0x0040343c in Shell (cx=0xc04f90, op=0xbffff5bc, envp=<value temporarily unavailable, due to optimizations>) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/shell/js.cpp:5113
#16 0x00403d90 in main (argc=4, argv=0xbffff764, envp=0xbffff778) at /Volumes/BruceDeuce/src/mozilla-10b/js/src/shell/js.cpp:5419
(gdb)

Jan 3, 2012
Project Member #100 classi...@floodgap.com
Both use these macroops (that are not used in -d). In fact, the set is identical for both test cases, so the bug is probably the same.

< [jaeger] Insns            == add32(TrustedImm32 imm, RegisterID dest) ==
3d1
< [jaeger] Insns            == and32(Imm32 imm, RegisterID dest) ==
10d7
< [jaeger] Insns            == branchPtrWithPatch(Condition cond, RegisterID lef
t, DataLabelPtr& dataLabel, ImmPtr initialRightValue) ==
15d11
< [jaeger] Insns            == compare32unsigned(RegisterID left, TrustedImm32 r
ight) ==
19d14
< [jaeger] Insns            == load16(ImplicitAddress address, RegisterID dest)
==

Jan 3, 2012
Project Member #101 classi...@floodgap.com
load16, and32 and compare32unsigned all appear grouped together in an idiomatic construction. Perhaps we are handling halfwords wrong.
Jan 3, 2012
Project Member #102 classi...@floodgap.com
It's something to do with inlining calls. This will fix both test cases:

diff --git a/js/src/methodjit/MonoIC.cpp b/js/src/methodjit/MonoIC.cpp
--- a/js/src/methodjit/MonoIC.cpp
+++ b/js/src/methodjit/MonoIC.cpp
@@ -947,16 +947,19 @@ class CallCompiler : public BaseCompiler
          * inlining the parent frame in the first place, so mark the immediate
          * caller as uninlineable.
          */
         if (f.script()->hasFunction) {
             f.script()->uninlineable = true;
             MarkTypeObjectFlags(cx, f.script()->function(), types::OBJECT_FLAG_UNINLINEABLE);
         }

+// XXX 96
+return true;
+
         /* Don't touch the IC if the call triggered a recompilation. */
         if (monitor.recompiled())
             return true;

         JS_ASSERT(!f.regs.inlined());

         /* Right now, take slow-path for IC misses or multiple stubs. */
         if (ic.fastGuardedNative || ic.hasJsFunCheck)


Expanding ShadowStackSpace to 224 and baseStackSpace to 128 made no difference, so I don't think it's a problem with linkage areas or the red zone.
Jan 3, 2012
Project Member #103 classi...@floodgap.com
simplified test 2 even further:

function iso(d) {
new Date(d).toISOString()
}

        iso(2009)
        iso(0)
        iso() 


The inline call shortcircuit is unsuitable. Many tests run it normally with no problems. Maybe if we made it dependent on type inference, which is a nasty wallpaper, but would work.
Jan 4, 2012
Project Member #104 classi...@floodgap.com
The problem only happens when a native call throws. If we have iso() in first or second position, it works fine (and throws normally). When the call is converted to native, throwing is fatal, but normal calls work (you can have iso(2009) as many times as you want, but the minute you throw once the call is native, it dies).

So we are doing something wrong with native calls. Stub calls work fine and throw normally, which is why -d works, because -d prevents stub calls from becoming natives.
Jan 4, 2012
Project Member #105 classi...@floodgap.com
The Throwpoline gets called for stubs (and works), but does NOT get called for natives. This is bad.
Jan 4, 2012
Project Member #106 classi...@floodgap.com
Well, I solved it, but this is ugly. Suggest a better idea. This works, however.

In MonoIC.cpp:generateNativeStub, we wrap the call to the native routine by pulling down a second dummy linkage area (subi r1, r1, 32; call; addi r1, r1, 32). This prevents the native from wrecking the stack frame (the cx and vp are already computed by then, so we don't need to preserve the stack pointer for the call). This gets us to Throwpoline, but then our current stack frame may be unreliable if the routine threw. To fix that, we rely on the fact that the stack frame is a constant 128 bytes, and just add that on to sp no matter what instead of the gentler lwz r1,0(r1) which is usually done. This will unwind our stack even if it is munged, and we can exit intact. This now passes everything.
Jan 4, 2012
Project Member #107 magef...@gmail.com
I think this is actually the Parameter Area coming to bite us! Looking at the disassembly of date_parseString (mangled as __ZL16date_parseStringP14JSLinearStringPdP9JSContext), I see 

  [...]
  stwu r1,-416(r1)
  [...]
  stw r4,444(r1)
  stw r5,448(r1)
  [...]

so gcc is actually spilling the arguments to the parameter area in the caller's frame. Unfortunately, since we didn't make space for that, it's actually corrupting our VMFrame. Fortunately, the fix is simple if perhaps not easy: add space for the parameter area to the VMFrame, right after the linkage area. Since AFAIK the most arguments used by any JIT-code call is 3, I think we only need to add 3 more words to the VMFrame size.
Jan 5, 2012
#108 tw...@pacbell.net
For the G3/G4 inline square root I propose the attached (in MPW PPCAsm format).
It handles all special cases per ECMA-262 and uses one constant (1/2).
I found 4 Newton-Raphson iterations to be necessary on the 7450, where the worst-case relative error of the frsqrte instruction is 1/59 (about 6 bits).
NRSqrt.s
1.3 KB   View   Download
Jan 5, 2012
Project Member #109 classi...@floodgap.com
Yeah, that does look like it's writing into that space, even though the call should have all its parameters in regs. I'm resisting expanding the VMFrame because we know ordinary stub calls work without one, so this causes all calls to increase base stack frame size even if the parameter area is never written to (stub calls have at most two arguments and they always fit in registers, and the stub routines don't spill). We could dynamically change the space being pulled down in the native call if calls required more parameters in a future JavaScript without having to adjust the VMFrame each time, too.

But I agree with your basic analysis, so the effect we're actually having is just dynamically widening our stack frame to be absolutely ABI compliant. However, I'll capitulate and rewrite the VMFrame if it can be proven that a stub call can munge the argument area also (we'll know this because the argument area is part of the JavaScript parameter space, so the JS engine will assert almost immediately when it tries to unwind its call, as it did in this case). Since the stub is calling the native for us, the native uses the stub's argument area, and everything is fine. I can't find any case where this is true, and gmake check jstests passes everything now (except -j fasta which is a known failure, and is the last stand of tracejit). It might be true in IonMonkey, though, so we should build a sufficiently large argument area into whatever we construct for that.

twisk, thank you for the code. I checked the register allocation and it seems to match fine with volatiles and the (painfully) small subset we are limited to in the methodjit, and a single constant is excellent. It seems to expect to be called as a subroutine, though, not as code that can actually be inlined (blr/bclr). I'd like to avoid this so that we can assume all code we generate is "straight line" (which is how all other code is generated). Can you flatten it out so that termination, rather than jumping to lr, simply jumps to the end of the routine?
Jan 5, 2012
Project Member #110 classi...@floodgap.com
As a followup, since I have crazy insomnia right now, I pulled the ABI docs and it looks like the minimum parameter area size is 8 words regardless of number of arguments (so 32 bytes). I'll need to pull down probably 64 bytes instead of 32 in MonoIC.cpp to meet that number, making a bogus linkage area as well. If we hardcoded that in the VMFrame, that would require making our stack frames 160 bytes which seems wasteful since it appears to be unused by everything but natives. Still, if there are stub calls that spill, we have no choice; I'm just trying to be cheap :)
Jan 5, 2012
Project Member #111 magef...@gmail.com
Here's a perhaps saner way to do it: have _two_ veneers, one which constructs a proper ABI-compliant stack frame to wrap the call (FYI, looking through gcc's generated assembler, I've never seen a frame smaller than 80 (!) bytes), and one which doesn't. Then, in BaseAssembler::getFallibleCallTarget(), use the one builds a frame for callConvetion == NormalCall and use the current frameless veneer for callConvention == FastCall.

I've attached a perl script that you can run against the assembly for a source file (e.g. from 'make StubCalls.s') and it will try to find functions that actually write to their caller's parameter area. It looks like all of the stub calls are safe, but only barely: a few stubs (stubs::Name, stubs::CallName, stubs::GetGlobalName) call a function (NameOp) which needs a parameter area -- but all three stubs build their own stack frame for the call. If at some point in the future gcc decides to inline the call, though...
check-argument-spills.pl
982 bytes   View   Download
Jan 5, 2012
Project Member #112 classi...@floodgap.com
Yup, they're safe (I checked them all during my fit of sleeplessness). If we bet wrong, though, the twin veneer idea sounds workable.
Jan 6, 2012
#113 tw...@pacbell.net
Flattening out the G3/G4 Sqrt code is no problem. However, I find the need for more control over the floating point environment. So: What assumptions can inline code make about the initial floating point state? And what rules apply to the final state?

Are the usual registers (r0, r3-r10, fp0-fp13, cr0-cr1, cr5-cr7) considered volatile in methodjit, or are there further restrictions?
Jan 7, 2012
Project Member #114 classi...@floodgap.com
You can use r0, r12 (and in a pinch r11); the allocator will not use these registers. The allocator *may* use r3-r10, so you shouldn't use any other GPRs; methodjit limits us to 32 total GPRs+FPRs, so r24-r31 are offlimits (actually, the trampoline uses r24).

As far as FPRs, I must have been on crack when I read your code, because there's a problem; the methodjit is only aware of f0-f7 (f0 and f1 are okay for use; the allocator can use f2-f7, so you can't use those). If we inline, we have to keep FPR usage to f0-f1, or save and reload from the red zone. If you need another FPR, we could probably steal one more from the allocator without significantly impacting runtime.

If we don't inline, then it becomes a stub call, essentially and there are no restrictions. That could be done; it just needs some infrastructure to support it.

You can use any of the condregs; we save them all, and none are in use at the time of the call. The only thing that I just realized is that it should try to simulate what fsqrt would do to CR1 just in case we have to change fsqrt to fsqrt. for subsequent condition testing.

Sorry, I blame the fatigue. 
Jan 7, 2012
Project Member #115 classi...@floodgap.com
And of course you can always spill and restore any register the allocator may have in use, as long as you put it back, of course. That would still be cheaper than having to inline the call, although we can assume this routine is a leaf, so even if we didn't inline it would not need a new stack frame. (We should mark it OS X ABI only in that case.)
Jan 10, 2012
#116 tw...@pacbell.net
I don't see any way to avoid using 2 nonvolatile FPRs: one for the constant and one to build up the correction term. So will have to go to the red zone. Can I assume that locations 0(r1) and up are safe to use?

Are all FPSCR status bits considered volatile? To test for the special cases I would like to reset some status bits, call frsqrte, then check (and reset) the status bits (possibly not all that changed).
Jan 11, 2012
Project Member #117 classi...@floodgap.com
Yes, you can do whatever you want with the FPSCR. For the red zone, stick to -4(r1) and up inclusive, just in case.
Jan 11, 2012
Project Member #118 classi...@floodgap.com
(and by up I mean -8, -12, etc)
Jan 18, 2012
#119 tw...@pacbell.net
OK, here is version 2 of the G3/G4 sqrt() code. It's set up for inlining (no blr, uses red zone), respects the nonvolatile regs you mentioned. I haven't bothered to emulate the CR1 effects of the G5 fsqrt. instruction yet, my cross that bridge later.
NRSqrtV2.s
1.7 KB   View   Download
Jan 18, 2012
Project Member #120 classi...@floodgap.com
This looks pretty good, cursorily. Ben or Tobias, any objections?

Looking at the code, let's go ahead and leave the CR1 portion off for now (for reference, according to IBM's docs, it should be:
fsqrt. 	Condition Register Field 1 	FX,FEX,VX,OX
). We should already trap this in the macroops by testing for an unordered result, and frsqrte. should leave f0 unordered if an exception occurs, so this should "just work."

I'll include your work with Ben's branch overhaul when that is complete. How shall I credit you?


Jan 22, 2012
#121 tw...@pacbell.net
Please use this version for sqrt() instead. I modified the last iteration of Newton-Raphson to get the least significant bit correct. In particular, taking the square root of a perfect square now gives the exact integer result (checked for all squares less than 2^64).

My real name is David Kilbridge. Is it customary to use a "handle" for FOSS work?

NRSqrtV3.s
1.8 KB   View   Download
Feb 4, 2012
Project Member #122 classi...@floodgap.com
No, not really. I'll credit you as such. I altered your routine to consolidate some of the initial checks, since JavaScript turns out not to care about the level of exception precision the FPU offers. However, it seems to work good with informal testing. Conformance testing is next.
Feb 4, 2012
Project Member #123 classi...@floodgap.com
Passes all tests.
Feb 14, 2012
Project Member #124 classi...@floodgap.com
So, now that 10.0.2pre is out, are we happy with it? Can we close this as done?
Feb 17, 2012
Project Member #125 classi...@floodgap.com
Shipp'd!

Further performance recommendations, file as new issues or in  issue 118 .
Status: Verified
Dec 15, 2012
Project Member #126 classi...@floodgap.com
(No comment was entered for this change.)
Blockedon: -tenfourfox:101 tenfourfox:101
Blocking: -tenfourfox:106 -tenfourfox:114 -tenfourfox:119 tenfourfox:106 tenfourfox:114 tenfourfox:119 tenfourfox:106
Sign in to add a comment

Powered by Google Project Hosting