New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: code text inspection confused by gdb breakpoints #6776
Labels
Milestone
Comments
This occurs when you set a breakpoint on a function that will be preempted or need to grow the stack. My guess is that the use of gdb causes a delay such that when the program starts running again, the next function sees a preemption check. That's how the program ends up in morestack. rewindmorestack assumes that it can look at the program code to understand how the stack will be unwound. The goal there is to simulate the next instruction, which is either a 1-byte or 4-byte jump. rewindmorestack is unhappy because that next instruction has been overwritten with a breakpoint instruction, which it did not expect and cannot handle. We could probably rearrange the code to avoid the instruction decode here, but that would have other implications. In particular, we have been talking about changing the way stack traces in general work, to recognize the beginning and end of a function by pattern matching the code instead of recording explicit tables. That code would have similar confusion if it saw breakpoints. In general, we assume we can look at code to see what is going on, and gdb is breaking that assumption by editing the code to implement breakpoints and letting the executing program see them. I am not sure how to resolve this tension. Only certain instructions are problematic in this way; perhaps there is a way to tell gdb not to set breakpoints on those instructions, ever. This won't be fixed for Go 1.2, which is imminent. One workaround is not to use gdb to single-step through functions. (Most of us debug by print statements, which is why we didn't run into this earlier.) I have marked this bug Go1.2.1, so that we are sure to consider it when issuing Go 1.2.1. However, depending on what we decide the answer is, the fix may be delayed further, whether to a later point release or to Go 1.3. Labels changed: added priority-later, go1.2.1, removed priority-triage. Status changed to Thinking. |
Rsc - Thanks for the insightful discussion about the tradeoffs being weighted. In the meantime, here is a suggested temporary workaround: If I change go1.2rc3/go/src/pkg/runtime/sys_x86.c:36 from if(pc[0] == 0xeb) { // jmp 1-byte offset to if(pc[0] == 0xeb || pc[0] == 0xcc) { // jmp 1-byte offset then the gdb experience is much better. Step-in 's' and step-over 'n' instructions sometimes result in continuing 'c' instead of stepping and stopping, but we don't panic the whole program in a work-halting crash. As a result, with this patch in place, it simply means I have to add additional breakpoints further along in the code. This is similar to using gdb on some C++ code that I've experienced in the past. Not horrible, and much better than panic-ing. - Jason |
Leaving this ticket for long-term issue consideration. For short-term: Opened #6834 https://golang.org/issue/6834 to invite the application of the above simple fix to sys_x86.c:36 |
Assuming that the patch from https://golang.org/issue/6834 has been applied, then we see the runaway behavior under gdb in this minimal program. package main import "fmt" funnc sub() { fmt.Printf("subroutine sub called.\n") } func main() { sub() // line 9; put gdb breakpoint here and then run; once stopped in gdb: 's', then 'n' fmt.Printf("done.\n") } I'm using go1.2rc3 under linux/amd64. |
I was able to use gdb's 'si' command repeatedly (after 's' into sub()) to get a function call trace of exactly what is happening in the runaway. It appears that a bunch of runtime stack-management routines and then the scheduler are involved. (gdb) s main.sub () at /home/jaten/go/gdbproblem/gdbprob.go:5 runtime.morestack00 () at /usr/cn/go1.2rc3/go/src/pkg/runtime/asm_amd64.s:387 runtime.morestack () at /usr/cn/go1.2rc3/go/src/pkg/runtime/asm_amd64.s:197 runtime.morestack () at /usr/cn/go1.2rc3/go/src/pkg/runtime/asm_amd64.s:225 runtime.newstack () at /usr/cn/go1.2rc3/go/src/pkg/runtime/stack.c:196 runtime.rewindmorestack (gobuf=void) at /usr/cn/go1.2rc3/go/src/pkg/runtime/sys_x86.c:27 0x000000000041b0db in runtime.rewindmorestack (gobuf=<error reading variable: can't compute CFA for this frame>) at /usr/cn/go1.2rc3/go/src/pkg/runtime/sys_x86.c:39 0x0000000000419cd3 in runtime.newstack () at /usr/cn/go1.2rc3/go/src/pkg/runtime/stack.c:230 2 runnable runtime.exitsyscall runtime.gosched0 (gp=void) at /usr/cn/go1.2rc3/go/src/pkg/runtime/proc.c:1373 runtime.lock (l=void) at /usr/cn/go1.2rc3/go/src/pkg/runtime/lock_futex.c:38 runtime.gosched0 (gp=void) at /usr/cn/go1.2rc3/go/src/pkg/runtime/proc.c:1379 schedule () at /usr/cn/go1.2rc3/go/src/pkg/runtime/proc.c:1286 runqget (p=void) at /usr/cn/go1.2rc3/go/src/pkg/runtime/proc.c:2728 0x000000000041777f in runqget (p=<error reading variable: can't compute CFA for this frame>) at /usr/cn/go1.2rc3/go/src/pkg/runtime/proc.c:2734 schedule () at /usr/cn/go1.2rc3/go/src/pkg/runtime/proc.c:1316 (continue) subroutine sub called. done. [Inferior 1 (process 17294) exited normally] (gdb) |
Continuing the study of the minimal program from comments #10 and #11, I instrumented gdb to see where it is trying to set the failing breakpoint that is getting ignored during runaway. The disassembly for the sub() subroutine is shown here, and gdb upon 'n' is trying to set a breakpoint at 0x0400c13, but this is never hit. (gdb) disas Dump of assembler code for function main.sub: => 0x0000000000400c00 <+0>: mov %fs:0xfffffffffffffff0,%rcx 0x0000000000400c09 <+9>: cmp (%rcx),%rsp 0x0000000000400c0c <+12>: ja 0x400c15 <main.sub+21> 0x0000000000400c0e <+14>: callq 0x420be0 <runtime.morestack00> 0x0000000000400c13 <+19>: jmp 0x400c00 <main.sub> ///*** gdb sets breakpoint here that is never hit. 0x0000000000400c15 <+21>: sub $0x40,%rsp 0x0000000000400c19 <+25>: lea 0x4bad40,%rbx 0x0000000000400c21 <+33>: lea (%rsp),%rbp 0x0000000000400c25 <+37>: mov %rbp,%rdi 0x0000000000400c28 <+40>: mov %rbx,%rsi 0x0000000000400c2b <+43>: movsq %ds:(%rsi),%es:(%rdi) 0x0000000000400c2d <+45>: movsq %ds:(%rsi),%es:(%rdi) 0x0000000000400c2f <+47>: lea 0x10(%rsp),%rdi 0x0000000000400c34 <+52>: xor %rax,%rax 0x0000000000400c37 <+55>: stos %rax,%es:(%rdi) 0x0000000000400c39 <+57>: stos %rax,%es:(%rdi) 0x0000000000400c3b <+59>: stos %rax,%es:(%rdi) 0x0000000000400c3d <+61>: callq 0x425130 <fmt.Printf> 0x0000000000400c42 <+66>: add $0x40,%rsp 0x0000000000400c46 <+70>: retq End of assembler dump. (gdb) n internal gdb trace: target_insert_breakpoint called: 0x400c13. subroutine sub called. done. [Inferior 1 (process 10943) exited normally] (gdb) transcript of full session: Current directory is /home/jaten/go/gdbproblem/ GNU gdb (GDB) 7.6.2 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>; This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>;... Reading symbols from /home/jaten/go/gdbproblem/gdbprob...done. Loading Go Runtime support. (gdb) break 9 Breakpoint 1 at 0x400c69: file /home/jaten/go/gdbproblem/gdbprob.go, line 9. (gdb) run Starting program: /home/jaten/go/gdbproblem/gdbprob warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffd000 target_insert_breakpoint called: 0x400c69. target_remove_breakpoint called: 0x400c69. Breakpoint 1, main.main () at /home/jaten/go/gdbproblem/gdbprob.go:9 (gdb) del Delete all breakpoints? (y or n) y (gdb) display/i $pc 1: x/i $pc => 0x400c69 <main.main+25>: callq 0x400c00 <main.sub> (gdb) disas Dump of assembler code for function main.main: 0x0000000000400c50 <+0>: mov %fs:0xfffffffffffffff0,%rcx 0x0000000000400c59 <+9>: cmp (%rcx),%rsp 0x0000000000400c5c <+12>: ja 0x400c65 <main.main+21> 0x0000000000400c5e <+14>: callq 0x420be0 <runtime.morestack00> 0x0000000000400c63 <+19>: jmp 0x400c50 <main.main> 0x0000000000400c65 <+21>: sub $0x40,%rsp => 0x0000000000400c69 <+25>: callq 0x400c00 <main.sub> 0x0000000000400c6e <+30>: lea 0x4adfd0,%rbx 0x0000000000400c76 <+38>: lea (%rsp),%rbp 0x0000000000400c7a <+42>: mov %rbp,%rdi 0x0000000000400c7d <+45>: mov %rbx,%rsi 0x0000000000400c80 <+48>: movsq %ds:(%rsi),%es:(%rdi) 0x0000000000400c82 <+50>: movsq %ds:(%rsi),%es:(%rdi) 0x0000000000400c84 <+52>: lea 0x10(%rsp),%rdi 0x0000000000400c89 <+57>: xor %rax,%rax 0x0000000000400c8c <+60>: stos %rax,%es:(%rdi) 0x0000000000400c8e <+62>: stos %rax,%es:(%rdi) 0x0000000000400c90 <+64>: stos %rax,%es:(%rdi) 0x0000000000400c92 <+66>: callq 0x425130 <fmt.Printf> 0x0000000000400c97 <+71>: add $0x40,%rsp 0x0000000000400c9b <+75>: retq End of assembler dump. (gdb) s main.sub () at /home/jaten/go/gdbproblem/gdbprob.go:5 1: x/i $pc => 0x400c00 <main.sub>: mov %fs:0xfffffffffffffff0,%rcx (gdb) disas Dump of assembler code for function main.sub: => 0x0000000000400c00 <+0>: mov %fs:0xfffffffffffffff0,%rcx 0x0000000000400c09 <+9>: cmp (%rcx),%rsp 0x0000000000400c0c <+12>: ja 0x400c15 <main.sub+21> 0x0000000000400c0e <+14>: callq 0x420be0 <runtime.morestack00> 0x0000000000400c13 <+19>: jmp 0x400c00 <main.sub> 0x0000000000400c15 <+21>: sub $0x40,%rsp 0x0000000000400c19 <+25>: lea 0x4bad40,%rbx 0x0000000000400c21 <+33>: lea (%rsp),%rbp 0x0000000000400c25 <+37>: mov %rbp,%rdi 0x0000000000400c28 <+40>: mov %rbx,%rsi 0x0000000000400c2b <+43>: movsq %ds:(%rsi),%es:(%rdi) 0x0000000000400c2d <+45>: movsq %ds:(%rsi),%es:(%rdi) 0x0000000000400c2f <+47>: lea 0x10(%rsp),%rdi 0x0000000000400c34 <+52>: xor %rax,%rax 0x0000000000400c37 <+55>: stos %rax,%es:(%rdi) 0x0000000000400c39 <+57>: stos %rax,%es:(%rdi) 0x0000000000400c3b <+59>: stos %rax,%es:(%rdi) 0x0000000000400c3d <+61>: callq 0x425130 <fmt.Printf> 0x0000000000400c42 <+66>: add $0x40,%rsp 0x0000000000400c46 <+70>: retq End of assembler dump. (gdb) n internal gdb trace: target_insert_breakpoint called: 0x400c13. subroutine sub called. done. [Inferior 1 (process 10943) exited normally] (gdb) |
Continuing the line of exploration from comments #10, #11, #12, the current goal -- in fixing this ever-present problem when debugging -- is to understand why the 'n' stepping breakpoint is set at 400c13, and then why it is ignored upon return from runtime.morestack. The gdb 7.6.2 source file gdb/infrun.c at line 146 contains a debugging flag (debug_infrun), which when set to 1 (and make/install gdb again), begins to reveal some of what gdb is thinking when it decides that, within the range of the beginning of sub() to the first source statement within sub(), i.e. the range 400c00 to 400c19 inclusive. The infrun.c code seems to be selecting 400c13 out of the range 400c00 - 400c19: infrun: resume (step=1, signal=0), trap_expected=0, current thread [LWP 970] at 0x400c00 infrun: wait_for_inferior () infrun: target_wait (-1, status) = infrun: 970 [LWP 970], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x400c09 infrun: stepping inside range [0x400c00-0x400c19] infrun: resume (step=1, signal=0), trap_expected=0, current thread [LWP 970] at 0x400c09 infrun: prepare_to_wait infrun: target_wait (-1, status) = infrun: 970 [LWP 970], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x400c0c infrun: stepping inside range [0x400c00-0x400c19] infrun: resume (step=1, signal=0), trap_expected=0, current thread [LWP 970] at 0x400c0c infrun: prepare_to_wait infrun: target_wait (-1, status) = infrun: 970 [LWP 970], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x400c0e infrun: stepping inside range [0x400c00-0x400c19] infrun: resume (step=1, signal=0), trap_expected=0, current thread [LWP 970] at 0x400c0e infrun: prepare_to_wait infrun: target_wait (-1, status) = infrun: 970 [LWP 970], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x420c30 infrun: stepped into subroutine infrun: inserting step-resume breakpoint at 0x400c13 after which, from additional instrumentation, we see that breakpoint.c/insert_bp_location() is called with bl->address: 0x400c13. (full stack trace of gdb for reference; obtained by running gdb on itself, and halting when the 0x400c13 breakpoint is inserted): #0 0x000000000059b005 in target_insert_breakpoint () #1 0x00000000004ef661 in bkpt_insert_location () #2 0x00000000004dc82d in insert_bp_location () #3 0x00000000004dd450 in insert_breakpoint_locations () #4 0x00000000004dd0d4 in insert_breakpoints () #5 0x0000000000555301 in proceed () #6 0x000000000054e2eb in step_once () #7 0x000000000054df4f in step_1 () #8 0x000000000054dd13 in next_command () #9 0x000000000048510b in do_cfunc () #10 0x0000000000488134 in cmd_func () #11 0x000000000066a72d in execute_command () #12 0x0000000000577247 in command_handler () #13 0x000000000057782a in command_line_handler () #14 0x00000000006c31ca in rl_callback_read_char () #15 0x0000000000576d7d in rl_callback_read_char_wrapper () #16 0x000000000057715e in stdin_event_handler () #17 0x0000000000575d02 in handle_file_event () #18 0x00000000005751ab in process_event () #19 0x000000000057524d in gdb_do_one_event () #20 0x00000000005752c3 in start_event_loop () #21 0x0000000000576da7 in cli_command_loop () #22 0x000000000056d21c in current_interp_command_loop () #23 0x000000000056e2d6 in captured_command_loop () #24 0x000000000056bfc6 in catch_errors () #25 0x000000000056f697 in captured_main () #26 0x000000000056bfc6 in catch_errors () #27 0x000000000056f6cd in gdb_main () #28 0x000000000040735a in main () |
I'll mention, for those anxiously looking to get this fixed as soon as possible (I am in this camp), that a fix is readily available, even though this might not be the last word on this topic since the rationale for rewindmorestack() is still unclear to me (and the mailing list discussion on-going; https://groups.google.com/forum/#!topic/golang-nuts/MIfJgl5SZmI). Anyway: after much further investigation and helpful replies from Dmitry and minux on the mailing list, the solution to this gdb problem -- for me, on linux/amd64 -- was to simply comment out the entire inner contents of the runtime·rewindmorestack(Gobuf *gobuf) function body (go/src/pkg/runtime/sys_x86.c:28), effectively turning the rewindmorestack() into a no-op. Instead of having rewindmorestack() simulate a jmp in software that hardware can do perfectly fine, we use the hardware, and gdb works again. |
Does this patch to (the current) pkg/runtime/sys_x86.c help? diff -r db15aed35700 src/pkg/runtime/sys_x86.c --- a/src/pkg/runtime/sys_x86.c Wed Jan 08 12:41:26 2014 -0800 +++ b/src/pkg/runtime/sys_x86.c Wed Jan 08 17:47:13 2014 -0800 @@ -27,7 +27,6 @@ runtime·rewindmorestack(Gobuf *gobuf) { byte *pc; - Func *f; pc = (byte*)gobuf->pc; if(pc[0] == 0xe9) { // jmp 4-byte offset @@ -38,12 +37,18 @@ gobuf->pc = gobuf->pc + 2 + *(int8*)(pc+1); return; } - if(pc[0] == 0xcc) { // breakpoint inserted by gdb - f = runtime·findfunc(gobuf->pc); - if(f != nil) { - gobuf->pc = f->entry; - return; - } + if(pc[0] == 0xcc) { + // This is a breakpoint inserted by gdb. We could use + // runtime·findfunc to find the function. But if we + // do that, then we will continue execution at the + // function entry point, and we will not hit the gdb + // breakpoint. So for this case we don't change + // gobuf->pc, so that when we return we will execute + // the jump instruction and carry on. This means that + // stack unwinding may not work entirely correctly + // (http://golang.org/issue/5723) but the user is + // running under gdb anyhow. + return; } runtime·printf("runtime: pc=%p %x %x %x %x %x\n", pc, pc[0], pc[1], pc[2], pc[3], pc[4]); runtime·throw("runtime: misuse of rewindmorestack"); |
Hi minux, yes, it appears so, but with quite a bit more work. For example, Apple contributed Objective-C specific routines to tell gdb not to set breakpoints on trampoline code, but to get the address of the next dispatched function and break there instead. gdb/infrun.c has the logic for deciding where stepping breakpoints will go (see the infrun debug trace in coment #13 and grep for those messages), and gdb/objc-lang.c :: objc_skip_trampoline() is an example of such language-specific logic. |
I think there must be something in DWARF that tells gdb where to set function breakpoints. because for normal C/C++ functions, if you set a breakpoint on them, gdb normally skips the function prologue (push ebp, mov esp, ebp, sub esp, $xxx); however, if you are debugging a program without DWARF debugging info, gdb will just break on the first instruction of the function. If it's the case, I'd prefer that we fix the DWARF to make sure gdb doesn't break on that jmp and revert changes to the runtime. (I think changing the runtime is just a workaround, not fixing the root cause) |
DWARF 3 and later permit the line table to record the end of the prologue, but as far as I know gdb does not look for it. gdb has a set of heuristics to find the end of the prologue, including looking at the instructions that are marked as being on the first line of the function. We should make those changes, but I don't think that obviates the need for the change to runtime·rewindmorestack, as gdb can insert breakpoints for many reasons. |
This issue was closed by revision 92b4741. Status changed to Fixed. |
rsc
added a commit
that referenced
this issue
May 11, 2015
… change the PC ««« CL 49580044 / 38cd458b1dfe runtime: if traceback sees a breakpoint, don't change the PC Changing the PC confuses gdb, because execution does not continue where gdb expects it. Not changing the PC has the potential to confuse a stack dump, but when running under gdb it seems better to confuse a stack dump than to confuse gdb. Fixes #6776. LGTM=rsc R=golang-codereviews, dvyukov, rsc CC=golang-codereviews https://golang.org/cl/49580044 »»» LGTM=r R=golang-codereviews, r CC=golang-dev https://golang.org/cl/69800043
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: