Compiling a specific line of code that seems harmless---in fact the structure of it is:
case something of res::int = complex expression if res == 1; // works fine = something else; = same complex expression as before if res == 5; // this is the triggering line end;
keeps giving me a segfault...but only when I'm "using dict,system". This is bizarre, the trigger code (a line just added to trees23.pure) has nothing to do with the system library, and it compiles just fine earlier in the case expression, and in fact the same structure is repeated several times in the file and the others all compile fine. Doing "using system,dict" instead doesn't crash, but then the interpreter crashes the first time it tries to display any result.
I don't know how to produce a minimal test case, that doesn't rely on the details of the library files I'm using right now. But I'm hoping from a backtrace and the error message you might be able to identify the issue, without having the specific setup to reproduce it.
Here's as far as I got towards finding a minimal case: in the context, the following line compiles fine: .. with last [] = if r1===nil then (bin l1 y1 c1) 0 () else (1); // other clauses for last end;
but this crashes: .. with last [] = if r1===nil then (bin l1 y1 c1) 0 () else (1 when ten = 10; end); // other clauses for last end;
Bizzare. As I said, many other occurrences of essentially the same pattern in this file give no trouble. I did retype the line from scratch several times in case there was some stray invisible character.
I reconfigured and rebuilt Pure using this configure line: ../configure --with-libgmp-prefix=/usr/local --enable-debug --without-elisp --prefix=/opt CC=clang CXX=clang++
on FreeBSD 9.ish, x86_64.
Then with the following environment: PURELIB=/usr/home/jim/repo/remote/pure-lang/pure/BUILD/../lib PURE_STACK=24000 PURE_INCLUDE=/home/jim/dev/pure/unspoiled PURE_ESCAPE=:
running: gdb /usr/home/jim/repo/remote/pure-lang/pure/BUILD/pure
Then at the gdb command line, typing (gdb) run -w -i -g --enable=list-opt --enable=trees23 gets me to the Pure interpreter. At the prompt I type: > using dict, system;
that fails with the error: Assertion failed: (act_env().xmap.find(xmap_key(tag, idx)) != act_env().xmap.end()), function vref, file ../interpreter.cc, line 15302.
Program received signal SIGABRT, Aborted.
(When I ran with the non-debug build, I was instead getting: Segmentation fault: 11 (core dumped) but running gdb on the pure.core file didn't seem to show anything useful.)
At this point I did a bt
in gdb. The result is attached.
- pure.backtrace 14.38KB
Comment #1
Posted on Aug 6, 2012 by Happy BearIt looks like some stack is just getting too deep or something. Switching the order of this triggering code block, call it Block2 with another one which was earlier compiling fine, call that one Block1. Anyway, switching the order has the result that now Block2 compiles fine, but now it's Block1 that has to be commented out to prevent the creashing.
Comment #2
Posted on Aug 6, 2012 by Massive PandaFirst, the third rule in your case statement should never be used anyway if the second rule doesn't have a guard on it.
Second, PURE_STACK=24000 won't help you much because the value is most likely much larger than your system's default C stack size. Try a smaller value like 4096 and see whether that gives you an orderly stack_fault exception.
Third, I see that you used clang to compile Pure. Can you reproduce the bug when compiling Pure with gcc?
Other than that, I'd really need a reasonably small test case so that I can reproduce this bug. The gdb backtrace looks like there might be a bug in the code generator somewhere, but it's hard to tell without knowing exactly which Pure code triggers this. If you can't come up with a small witness, I'd at least need the set of library scripts that you're using and detailed instructions and/or a test script showing how to trigger the bug.
Comment #3
Posted on Aug 6, 2012 by Happy BearGuards are present on all the real clauses. Also none of this is ever getting executed, it crashes just on compilation. Stack value of 4096 doesn't change the behavior at all. (Is that setting also honored while compiling, by the way?)
Results building with gcc 4.2.1 with --enable-debug seem to be the same. The backtrace looks similar: Assertion failed: (act_env().xmap.find(xmap_key(tag, idx)) != act_env().xmap.end()), function vref, file ../interpreter.cc, line 15302.
Program received signal SIGABRT, Aborted. [Switching to Thread 804407400 (LWP 103190/pure)] 0x0000000803c1133c in thr_kill () from /lib/libc.so.7 (gdb) bt
0 0x0000000803c1133c in thr_kill () from /lib/libc.so.7
1 0x0000000803ca624b in abort () from /lib/libc.so.7
2 0x0000000803c8f9d5 in __assert () from /lib/libc.so.7
3 0x00000008009d6734 in interpreter::vref (this=0x7fffffffb8f0, tag=909, idx=2 '\002', p=@0x7ffffffe0920) at ../interpreter.cc:15302
4 0x00000008009f8558 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe1720, quote=false) at ../interpreter.cc:14670
5 0x00000008009fa378 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe21e0, quote=false) at ../interpreter.cc:14829
6 0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe2cb0, quote=false) at ../interpreter.cc:14829
7 0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe3780, quote=false) at ../interpreter.cc:14829
8 0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe4250, quote=false) at ../interpreter.cc:14829
9 0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe4d20, quote=false) at ../interpreter.cc:14829
10 0x00000008009fa312 in interpreter::codegen (this=0x7fffffffb8f0, x=@0x7ffffffe5250, quote=false) at ../interpreter.cc:14829
11 0x0000000800a00495 in interpreter::toplevel_codegen (this=0x7fffffffb8f0, x=@0x7ffffffe55f0, rp=0x806c26340) at ../interpreter.cc:14197
12 0x0000000800a01380 in interpreter::try_rules (this=0x7fffffffb8f0, pm=0x806c689c0, s=0x806c81670, failedbb=0x8080de420, reduced=@0x7ffffffe7ca0,
tmps=@0x7ffffffe7a80) at ../interpreter.cc:16741
...
I'll work on trying to get a minimal test case.
Comment #4
Posted on Aug 6, 2012 by Massive PandaYeah, it really looks like it's a bug in the code generator. A minimal test case will be very helpful, but I understand that it may be difficult to produce in this case. It should be good enough to have the set of library scripts that you used along with instructions how to reproduce the bug.
Comment #5
Posted on Aug 6, 2012 by Happy BearHere's a pretty minimal test case. Running the following in the interpreter, using the libraries from current hg tip (so none of my other local library changes), still gives me the segfault, before ever returning to the prompt. Commenting out either of block1 or block2 makes the code compile fine. Note that when I have this in a separate file and use it, the crash isn't triggered until the interpreter tries to display some result. So I was doing use badfile; 0;
to test for the crash.
This dummy code isn't supposed to make sense. But it should be legal, and even if it weren't it shouldn't crash, right?
public kons g1 g2;
bar y = case y of res::int // call this block1 = foo y with foo _ = snag when kons snag = g1 y; end; end if res == 0; // call this block2 = foo y with foo _ = snag when kons snag = g2 y; end; end if res == 1; end;
Comment #6
Posted on Aug 6, 2012 by Happy BearAlso when entering that code directly into the interpreter, it seems I need to add a new line "0;" at the end to trigger the crash.
Comment #7
Posted on Aug 6, 2012 by Happy BearEven more minimal: $ ./run-pure --norc -n Pure 0.56 (x86_64-unknown-freebsd9.0) Copyright (c) 2008-2012 by Albert Graef (Type 'help' for help, 'help copying' for license information.)
bar y = case y of 0 = foo y with foo y = x when [x] = y end end; 1 = baz y with baz y = x when [x] = y end end end; 0; Segmentation fault: 11 (core dumped)
Comment #8
Posted on Aug 6, 2012 by Massive PandaCool, many thanks for the short example! I can reproduce this, and I'll have a look at it asap.
Comment #9
Posted on Aug 6, 2012 by Happy BearIn case it's useful: appending a "when dummy = 0 end" to the end of either of the case rules (or both) makes everything good again. That is, this doesn't crash:
bar y = case y of 0 = foo y with foo y = x when [x] = y end end when dummy = 0 end; 1 = baz y with baz y = x when [x] = y end end end;
Comment #10
Posted on Aug 14, 2012 by Massive PandaJust for the record, I now boiled the bug witness down to:
bar y = case y of 0 = a with a = 0 end; 1 = b with b = 1 end end; 0;
This still craps out with the same assertion.
This is a tough one. It seems that the 'case' environment is to blame here. The problem is not really in the code generator but already in the frontend, more precisely in the FMap data structures needed to handle all the lambda lifting stuff.
Right now 'case' is handled analogous to a lambda there, so it lacks the subenvironments necessary to tell apart the local function bindings of the different case rules. (Wrapping up the function environments in an extra 'when' clause works around this, which explains the behaviour you mentioned in comment #9.) I'll probably have to handle the 'case' environment in a fashion similar to 'with' clauses to repair this defect.
Comment #11
Posted on Aug 14, 2012 by Massive PandaWell, the bug is in the code generator after all. The FMaps are created properly all right, but it seems that the code generator clobbers some of the traversal pointers during code generation, so that the traversal of the sub-FMaps in a 'case' environment gets messed up. This code has become a real mess, maybe I need to rewrite it.
Comment #12
Posted on Aug 14, 2012 by Massive PandaScratch that, it seems that my original suspicion from comment #10 was right. For the code generator to work properly, each 'case' rule needs its own root in the FMap forest so that the call to FMap::select() in try_rules() does the right thing. This should be easy to fix, so please stay tuned...
Comment #13
Posted on Aug 14, 2012 by Massive PandaThis issue was closed by revision a898740681a8.
Comment #14
Posted on Aug 14, 2012 by Massive PandaOk, this should be fixed now, can you please give it a whirl?
Comment #15
Posted on Aug 16, 2012 by Happy BearHi Albert, thanks for fixing this. The real code that was triggering this is now working fine, no problems. Looks like you got it.
Comment #16
Posted on Aug 16, 2012 by Massive Panda(No comment was entered for this change.)
Status: Verified
Labels:
Type-Defect
Priority-Medium