Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: handle signal on non-Go thread #3250

Closed
nsf opened this issue Mar 8, 2012 · 16 comments
Closed

runtime/cgo: handle signal on non-Go thread #3250

nsf opened this issue Mar 8, 2012 · 16 comments

Comments

@nsf
Copy link

nsf commented Mar 8, 2012

For some reason certain action in C code forces segfault in Go. It was found in my gtk
bindings, I have stipped it down to one test file that segfaults on my machine. I did hg
bisect, the problem appears starting from revision 11922:daf22f371d51 (os/signal:
selective signal handling).

Here's the source code:
--------------------------------------------------------------------
package main

/*                                                                              
#include <gtk/gtk.h>                                                            
                                                                                
#cgo pkg-config: gtk+-3.0                                                       
*/
import "C"

func main() {
        C.gtk_init(nil, nil)
        C.gtk_file_chooser_button_new(nil, 0)
}
--------------------------------------------------------------------

It happens on linux 3.2.8 (archlinux distribution), as stated above any go version
starting from rev 11922, x86 achitecture, gtk 3.2.3

Here's the backtrace from gdb:


(gdb) bt
#0  runtime.sigtramp (sig=void, info=void, context=void) at
/home/nsf/go/src/pkg/runtime/sys_linux_386.s:176
#1  0x0805842b in runtime.sigtramp (sig=void, info=void, context=void) at
/home/nsf/go/src/pkg/runtime/sys_linux_386.s:195
#2  0x00000011 in ?? ()
#3  0xb47fe99c in ?? ()
#4  0x00000000 in ?? ()

And it seems that app creates a bunch of threads (could be related or not):
(gdb) info threads
  Id   Target Id         Frame 
* 5    Thread 0xb47ffb40 (LWP 1903) "test" runtime.sigtramp (sig=void,
info=void, context=void)
    at /home/nsf/go/src/pkg/runtime/sys_linux_386.s:176
  4    Thread 0xb53e2b40 (LWP 1902) "test" 0xb7fdd424 in __kernel_vsyscall ()
  3    Thread 0xb5be3b40 (LWP 1901) "test" 0xb7fdd424 in __kernel_vsyscall ()
  2    Thread 0xb6f48b40 (LWP 1900) "test" 0xb7fdd424 in __kernel_vsyscall ()
  1    Thread 0xb7089800 (LWP 1897) "test" 0xb747d026 in _int_free () from /lib/libc.so.6


If you guys are totally have no idea what's that, I can also try to dig gtk3 and remove
it from the test case (reproducing the bug with simple libraries only, like pthreads).
But I think it will be quite hard to do.

P.S. The same code in C runs fine:
---------------------------------------------------------------------
[nsf @ go-test]$ cat test.c
#include <gtk/gtk.h>

int main(int argc, char **argv)
{
        gtk_init(0, 0);
        gtk_file_chooser_button_new(0, 0);
}

[nsf @ go-test]$ gcc -o test test.c `pkg-config --cflags --libs gtk+-3.0`
[nsf @ go-test]$ ./test
[nsf @ go-test]$ gdb --quiet ./test
Reading symbols from /home/nsf/tmp/go-test/test...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/nsf/tmp/go-test/test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb6745b40 (LWP 2034)]
[New Thread 0xb5dffb40 (LWP 2035)]
[New Thread 0xb53ffb40 (LWP 2036)]
[Thread 0xb53ffb40 (LWP 2036) exited]
[Thread 0xb5dffb40 (LWP 2035) exited]
[Thread 0xb6745b40 (LWP 2034) exited]
[Inferior 1 (process 2031) exited with code 0240]
(gdb) quit
---------------------------------------------------------------------
@bradfitz
Copy link
Contributor

bradfitz commented Mar 8, 2012

Comment 1:

You may have to runtime.LockOSThread to interact with GTK's event loop?

@nsf
Copy link
Author

nsf commented Mar 8, 2012

Comment 2:

1. This code doesn't use GTK's event loop. It's explicitly started with gtk_main usually.
2. Other code works fine (20+ demos using different widgets and events).
3. I tried runtime.LockOSThread(). And tried running the code above in the 'init'
function. Doesn't help.

@rsc
Copy link
Contributor

rsc commented Mar 8, 2012

Comment 3:

I believe that gtk is creating some thread and then that thread
gets a signal, and then the Go signal handler is invoked.
I am not sure what to do about this.  We like our signal handlers,
but they can't cope with being invoked on non-Go threads.
We could ignore such signals easily enough, but perhaps
gtk is really trying to handle that signal (or maybe it's a SIGSEGV
or something).
Russ

Labels changed: added priority-go1, removed priority-triage.

Owner changed to builder@golang.org.

Status changed to Accepted.

@nsf
Copy link
Author

nsf commented Mar 9, 2012

Comment 4:

Just want to mention, that the issue is most likely related to gtk DBus usage. On the
client side, it seems that the only signal it touches is SIGPIPE. It has code:
  #if HAVE_DECL_MSG_NOSIGNAL
  static dbus_bool_t _dbus_modify_sigpipe = FALSE;
  #else
  static dbus_bool_t _dbus_modify_sigpipe = TRUE;
  #endif
And then on connection opening it does:
  if (_dbus_modify_sigpipe)
    _dbus_disable_sigpipe ();
Which in turn results in a function call (if true):
  void _dbus_disable_sigpipe (void)
  {
    signal (SIGPIPE, SIG_IGN);
  }
On my machine I know it doesn't run _dbus_disable_sigpipe, maybe that's the issue.
Honestly I'm not an expert on how signals work in linux.

@rsc
Copy link
Contributor

rsc commented Mar 9, 2012

Comment 5:

Can you run your program under strace -f to find which signal is being
delivered?
If it is only SIGPIPE, we might be able to do a simple workaround for Go 1.

@nsf
Copy link
Author

nsf commented Mar 9, 2012

Comment 6:

The worst part that it runs fine under strace/ltrace. But it died once, however only
once, I wasn't able to repeat that under strace, see the second segfault log file. But
I'm afraid it won't be very helpful.

Attachments:

  1. strace-log.txt (160911 bytes)
  2. strace-log-segfault.txt (153031 bytes)

@nsf
Copy link
Author

nsf commented Mar 9, 2012

Comment 7:

Hm.. it dies often if I run strace without "-o" option (writes output to a file), here's
the two variants of dying:
SIGPIPE:
[pid 11326] read(3,
"\1\10\v\0\22\0\0\0\37\0\0\0\0\0\0\0H\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 104
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
[pid 11326] writev(3, [{"\24\0\6\0\1\0@\1\212\1\0\0\6\0\0\0\0\0\0\0\4\0\0\0", 24},
{NULL, 0}, {"", 0}], 3) = 24
[pid 11326] poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 11326] read(3, "\1
\f\0\1\0\0\0\6\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 36
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] write(1, "unix:abstract=/tmp/dbus-LOsdVe90"..., 73) = -1 EPIPE (Broken pipe)
[pid 11326] --- {si_signo=SIGPIPE, si_code=SI_USER, si_pid=11326, si_uid=1000,
si_value={int=3076563288, ptr=0xb760a158}} (Broken pipe) ---
Process 11315 resumed
Process 11326 detached
Process 11315 detached
SIGCHLD? (but here it seems almost finished, we can see the last close calls):
[pid 11678] write(1, "unix:abstract=/tmp/dbus-LOsdVe90"..., 73) = 73
[pid 11678] write(1, "\370)\0\0", 4)    = 4
[pid 11678] write(1, "\1\0@\1", 4)      = 4
[pid 11678] close(1)                    = 0
[pid 11678] close(2)                    = 0
[pid 11678] exit_group(0)               = ?
Process 11678 detached
[pid 11677] <... select resumed> )      = 1 (in [9])
[pid 11677] --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11678, si_status=0,
si_utime=0, si_stime=0} (Child exited) ---
[pid 11677] --- {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x2c} (Segmentation
fault) ---
Process 11677 detached
[pid 11676] +++ killed by SIGSEGV +++
[pid 11675] +++ killed by SIGSEGV +++
[pid 11674] +++ killed by SIGSEGV +++
+++ killed by SIGSEGV +++
And normally it runs fine:
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{"\24\0\6\0\1\0@\1\212\1\0\0\6\0\0\0\0\0\0\0\4\0\0\0", 24}, {NULL, 0}, {"",
0}], 3) = 24
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
read(3, "\1 \f\0\1\0\0\0\6\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 36
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
write(1, "unix:abstract=/tmp/dbus-LOsdVe90"..., 73) = 73
write(1, "\370)\0\0", 4)                = 4
write(1, "\1\0@\1", 4)                  = 4
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
Process 11746 detached

@rsc
Copy link
Contributor

rsc commented Mar 12, 2012

Comment 8:

I created CL 5797068 to at least diagnose the problem better.
I do not believe we will be able to fix this for Go 1.

Labels changed: added priority-later, removed priority-go1.

@nsf
Copy link
Author

nsf commented Mar 12, 2012

Comment 9:

[nsf @ go-test]$ ./test
runtime: signal received on thread not created by Go.
Segmentation fault
Clearly that's the case.. Will look forward to a fix.

@gopherbot
Copy link

Comment 10 by joshrickmar:

I've run into this issue as well, trying to add a GtkEntry to a container (code
attached) with GTK 3.8 on OpenBSD.  I'm running go tip (changeset 8519983c00e8), and the
process no longer crashes, but produces a more useful error message:
runtime: signal received on thread not created by Go: SIGCHLD: child status has changed
Is there any way with the current source that we can catch this and ignore it, since we
no longer crash?

Attachments:

  1. test.go (297 bytes)

@minux
Copy link
Member

minux commented Jun 12, 2013

Comment 11:

as a workaround, you could just add a return statement in function runtime.badsignal in
src/pkg/runtime/os_$GOOS.c to ignore any signals received on foreign threads.

@gopherbot
Copy link

Comment 12 by joshrickmar:

With the exception of processes ignoring my SIGINTs, that's a pretty good fix.
Here's the signals that I'm seeing with GTK and my test, now that it's not quitting
immediately:
runtime: signal received on thread not created by Go: SIGTERM: termination
runtime: signal received on thread not created by Go: SIGWINCH: window size change
Both of these signals, as well as a few others, have the default action of being ignored
(according to signal(3)).  Should this be fixed by listening for these signals, and if
they are sent, to ignore them completely?

@gopherbot
Copy link

Comment 13 by joshrickmar:

Oops, that should be:
runtime: signal received on thread not created by Go: SIGCHLD: child status has changed
runtime: signal received on thread not created by Go: SIGWINCH: window size change

@gopherbot
Copy link

Comment 14 by joshrickmar:

Here's a quick patch I put together, that ignores those signals that have no default
action.  I only modified the OpenBSD files, but the other platforms should have a
similar fix.  I've had no issues making GTK3 calls with cgo with this patch.

Attachments:

  1. ignore-signals.patch (1609 bytes)

@minux
Copy link
Member

minux commented Jul 11, 2013

Comment 15:

This issue was closed by revision 2f1ead7.

Status changed to Fixed.

@nsf nsf added fixed labels Jul 11, 2013
@gopherbot
Copy link

CL https://golang.org/cl/12503 mentions this issue.

ianlancetaylor added a commit that referenced this issue Jul 22, 2015
In the past badsignal would crash the program.  In
https://golang.org/cl/10757044 badsignal was changed to call sigsend,
to fix issue #3250.  The effect of this was that when a non-Go thread
received a signal, and os/signal.Notify was not being used to check
for occurrences of the signal, the signal was ignored.

This changes the code so that if os/signal.Notify is not being used,
then the signal handler is reset to what it was, and the signal is
raised again.  This lets non-Go threads handle the signal as they
wish.  In particular, it means that a segmentation violation in a
non-Go thread will ordinarily crash the process, as it should.

Fixes #10139.
Update #11794.

Change-Id: I2109444aaada9d963ad03b1d071ec667760515e5
Reviewed-on: https://go-review.googlesource.com/12503
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Aug 5, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants