Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syscall/channel select hang on Darwin OSX? #5287

Closed
quarnster opened this issue Apr 13, 2013 · 16 comments
Closed

Syscall/channel select hang on Darwin OSX? #5287

quarnster opened this issue Apr 13, 2013 · 16 comments

Comments

@quarnster
Copy link

What steps will reproduce the problem?
If possible, include a link to a program on play.golang.org.
1. http://play.golang.org/p/QSuUU42ggi (It's a minimalish repro based on termbox-go)
2. go run on darwin amd64


What is the expected output?

Do nothing and it should time out after 5 seconds. Spam the keyboard and it should print
out the keys. Hit ctrl+q to quit. This is what it does if I 
a) Use go version 1.0.3 or
b) Don't call any variant of Py_Initialize or
c) Run on another OS
d) Use Python 2.7 rather than Python 3.3. The 3.3 version is a universal version
compiled from source in case that matters, but IIRC I saw the same thing with "brew
install python3".

What do you see instead?

Sometimes it gets stuck at the very first "Waiting on signal" log message and
does not time out nor do keyboard input do anything at all. I can usually spam the
keyboard for a second and if it hasn't hung, hit ctrl+q and try again. CPU usage on one
core appears to be at 100%.


Which compiler are you using (5g, 6g, 8g, gccgo)?

gc

Which operating system are you using?

OSX 10.8.3 amd64


Which version are you using?  (run 'go version')

21:18 ~/code/3rdparty/termbox/build $ go version
go version devel +b27b1ff18f39 Wed Apr 10 07:15:49 2013 +0200 darwin/amd64

Please provide any additional information below.

21:19 ~/code/3rdparty/termbox/build $ cc --version
Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.3.0
Thread model: posix

21:19 ~/code/3rdparty/termbox/build $ gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM
build 2336.11.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.




If I attach to the hung process with gdb:

(gdb) t a a bt

Thread 5 (Thread 0x1b03 of process 92745):
#0  0x000000000402230b in runtime.mach_semaphore_wait ()
#1  0x0000000004012d2e in runtime.mach_semacquire ()
#2  0x0000000000001d03 in ?? ()
#3  0x0000000000010000 in ?? ()
#4  0x000000000400aff9 in runtime.unlock ()
#5  0x000000c20003c000 in ?? ()
#6  0x0000000000000001 in ?? ()
#7  0x0000000004012618 in runtime.semasleep ()
#8  0x0000000000001d03 in ?? ()
#9  0xffffffffffffffff in ?? ()
#10 0x00000000041174a0 in empty_value ()
#11 0x000000000400b19a in runtime.notesleep ()
#12 0xffffffffffffffff in ?? ()
#13 0x0000000000000000 in ?? ()

Thread 4 (Thread 0x1a03 of process 92745):
#0  0x000000000402230b in runtime.mach_semaphore_wait ()
#1  0x0000000004012d2e in runtime.mach_semacquire ()
#2  0x1301010100001b03 in ?? ()
#3  0x000000000400aff9 in runtime.unlock ()
#4  0x000000c20003c000 in ?? ()
#5  0x0000000000000001 in ?? ()
#6  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x1903 of process 92745):
#0  0x000000000402230b in runtime.mach_semaphore_wait ()
#1  0x0000000004012d2e in runtime.mach_semacquire ()
#2  0x1301010100001803 in ?? ()
#3  0x0000000000001000 in ?? ()
#4  0x0000001fb0103e20 in ?? ()
#5  0x00007fff0000000a in ?? ()
#6  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x1803 of process 92745):
#0  0x000000000402230b in runtime.mach_semaphore_wait ()
#1  0x0000000004012d2e in runtime.mach_semacquire ()
#2  0x0000000000001403 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x1703 of process 92745):
#0  0x000000000401b7d4 in runtime.newstack ()
#1  0x000000000400b19a in runtime.notesleep ()
#2  0xffffffffffffffff in ?? ()
#3  0x0000000000000000 in ?? ()
@quarnster
Copy link
Author

Comment 1:

GOTRACEBACK=2 gives me
SIGABRT: abort
PC=0x402230b
runtime.mach_semaphore_wait()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/sys_darwin_amd64.s:391 +0xb
runtime.mach_semacquire(0x1403, 0xffffffffffffffff)
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/os_darwin.c:426 +0xbe
runtime.semasleep(0xffffffffffffffff)
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/os_darwin.c:32 +0x58
runtime.notesleep(0x4117518)
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/lock_sema.c:159 +0xba
sysmon()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:2004 +0x196
runtime.mstart()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:485 +0xd2
goroutine 1 [runnable]:
runtime.park(0x4005f30, 0xc20008f000, 0x41153d6)
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1175 +0x64
selectgo(0x43d8d70)
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/chan.c:989 +0x333
runtime.selectgo()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/chan.c:841 +0x12
main.main()
        command-line-arguments/_obj/bug.cgo1.go:133 +0x935
runtime.main()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:182 +0x92
runtime.goexit()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1223
goroutine 2 [syscall]:
runtime.goexit()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1223
goroutine 3 [syscall]:
runtime.entersyscallblock()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1333 +0x16e
runtime.MHeap_Scavenger()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/mheap.c:435 +0xee
runtime.goexit()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1223
created by runtime.main
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:165
goroutine 4 [runnable]:
runtime.exitsyscall()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1388 +0x119
runtime.signal_recv(0xc20005a030)
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/zsigqueue_darwin_amd64.c:86 +0xdc
os/signal.loop()
        /Users/quarnster/code/3rdparty/go/src/pkg/os/signal/signal_unix.go:21 +0x1c
runtime.goexit()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1223
created by os/signal.init·1
        /Users/quarnster/code/3rdparty/go/src/pkg/os/signal/signal_unix.go:27 +0x2f
goroutine 5 [syscall]:
runtime.entersyscallblock()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1333 +0x16e
timerproc()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/ztime_darwin_amd64.c:195 +0xbc
runtime.goexit()
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/proc.c:1223
created by addtimer
        /Users/quarnster/code/3rdparty/go/src/pkg/runtime/ztime_darwin_amd64.c:82
rax     0xe
rbx     0x4117518
rcx     0xb00809e8
rdx     0x1
rdi     0x1403
rsi     0x0
rbp     0xffffffffffffffff
rsp     0xb00809e8
r8      0xb0080a48
r9      0x12
r10     0x0
r11     0x246
r12     0xfcd0294d0765
r13     0xfff4de99de68
r14     0x12f4b532268efc00
r15     0x40011e0
rip     0x402230b
rflags  0x246
cs      0x7
fs      0x0
gs      0x0

@quarnster
Copy link
Author

Comment 2:

Even simpler reproduction is at http://play.golang.org/p/RxLZLU9HhF. Usually I can just
press and hold ctrl+c for a short time and the hang happens. If it doesn't, hitting
ctrl+z to break out into the shell and then fg to go back and now holding ctrl+c, or
killing the process and trying again makes it hang.
The backtrace remains the same.
Also reproduces on OSX 10.7.4 with a brew installed python3.3.1 and:
09:10 ~ $ go version
go version devel +ce5b441d2fc6 Sun Apr 14 09:22:57 2013 +1000 darwin/amd64
09:11 ~ $ cc --version
Apple clang version 3.1 (tags/Apple/clang-318.0.61) (based on LLVM 3.1svn)
Target: x86_64-apple-darwin11.4.0
Thread model: posix
09:11 ~ $ gcc --version
i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Based on Apple Inc. build 5658) (LLVM
build 2336.9.00)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@quarnster
Copy link
Author

Comment 3:

Two syscall traces captured with dtruss from the launch of
http://play.golang.org/p/RxLZLU9HhF to the point of the hang are attached, one with back
traces enabled. Hitting ctrl+c in the hung state inserts further "17869/0x27f1f: 
sigreturn(0xC200052F48, 0x1E, 0x0)               = 0 Err#-2" in the trace buffer without
back tracing enabled.

@quarnster
Copy link
Author

Comment 4:

Apparently the attached files are lost if I fail typing in the correct captcha.

Attachments:

  1. dtruss.txt (112171 bytes)
  2. dtruss_s.txt (1165117 bytes)

@dvyukov
Copy link
Member

dvyukov commented Apr 15, 2013

Comment 5:

Can reproduce it.
It seems that runtime.sighandler() tries to grow stack.
newstack() finds m->curg==nil and crashes.
Don't know yet why it happens.

@ianlancetaylor
Copy link
Contributor

Comment 6:

runtime.sighandler should only be invoked from runtime·sigtramp in sys_darwin_amd64.s. 
That should ensure that m is set and that g is set to m->gsignal.  m->gsignal should
have enough stack space that sighandler does not need to grow the stack.  So perhaps the
first step is finding which in that set of steps is not happening.

Labels changed: added go1.1.

@dvyukov
Copy link
Member

dvyukov commented Apr 15, 2013

Comment 7:

>m->gsignal should have enough stack space that sighandler does not need to grow the
stack.
Yeah, but it does not check that rsp is actually inside of gsignal stack.

@dvyukov
Copy link
Member

dvyukov commented Apr 15, 2013

Comment 8:

PyInitialize messes with sigaltstack.
Please try the following patch and see whether it fixes the hang:
https://golang.org/cl/8777043/
I suspect that it actually can hang on other OSes and on Go1.0.3 as well. But it just
requires some special circumstances to occur.
This is quite risky to push it in into Go1.1. So I think it is at most Go1.1.1.
Ian, what do you think about the patch?
Runtime can not sustain arbitrary messing with system environment. For example, C code
can setup own signal handlers and that would break the runtime as well. So this change
is somewhat questionable. But I think we need to at least assert that SP belongs to
gsignal stack.

Labels changed: added priority-later, removed priority-triage, go1.1.

Owner changed to @dvyukov.

Status changed to Accepted.

@ianlancetaylor
Copy link
Contributor

Comment 9:

If I understand your description and your patch correctly, the problem is that some C
code called by Go called sigaltstack but did not override the Go signal handlers.  I
don't think it's reasonable to let two programs fight over the same signal handlers. 
Rather than an approach like your patch, perhaps it would make more sense to have some
way to disable the Go signal handlers.  Ultimately the two programs need to come to some
agreement on which is going to handle signals.

@dvyukov
Copy link
Member

dvyukov commented Apr 16, 2013

Comment 10:

>the problem is that some C code called by Go called sigaltstack but did not override
the Go signal handlers.
Maybe it has overridden the signal handlers but not all.
The problem is that Go and C handle disjoint sets of signals, but both want own
sigaltstack (which is not tied to a signal number).

@ianlancetaylor
Copy link
Contributor

Comment 11:

> Maybe it has overridden the signal handlers but not all.
Likely enough.  I don't think this affects my argument.  We need some way for the
programs to sensibly agree on what to do, not have them trying to second guess each
other.

@quarnster
Copy link
Author

Comment 12:

I have indeed been unable to reproduce the hang with the patch applied. Cheers!

@dvyukov
Copy link
Member

dvyukov commented Apr 16, 2013

Comment 13:

>We need some way for the programs to sensibly agree on what to do, not have them trying
to second guess each other.
We can provide an API for that, but it won't fix issues similar to this one. If you link
Python or a similar fat C library, I guess there are not much ways how you can affect
its behavior.

@quarnster
Copy link
Author

Comment 14:

issue #4216 sounds like it's related to this.

@rsc
Copy link
Contributor

rsc commented Jul 25, 2013

Comment 15:

Status changed to Duplicate.

@rsc
Copy link
Contributor

rsc commented Jul 25, 2013

Comment 16:

Merged into issue #4216.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants