Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: garbage collection crash in freebsd/386 runtime running on freebsd/amd64 #2675

Closed
dhobsd opened this issue Jan 10, 2012 · 35 comments
Closed

Comments

@dhobsd
Copy link
Contributor

dhobsd commented Jan 10, 2012

Before filing a bug, please check whether it has been fixed since
the latest release: run "hg pull", "hg update default", rebuild, and
retry
what you did to
reproduce the problem.  Thanks.

What steps will reproduce the problem?
1. Attempt to build go on FreeBSD/i386

What is the expected output?
Successful build

What do you see instead?
cgo segfaults on pkg/net/cgo_unix.go

Which compiler are you using (5g, 6g, 8g, gccgo)?
8g

Which operating system are you using?
FreeBSD/386

Which revision are you using?  (hg identify)
cc0f39d02e93 (this is the earliest revision I was able to get the build to break using
hg bisect -- freebsd/386 has been broken for quite some time, it seems)

Please provide any additional information below.
(gdb) r
Starting program: /usr/home/dho/go-old/bin/cgo -- cgo_unix.go
warning: `/usr/libexec/ld-elf.so.1': Shared library architecture i386:x86-64 is not
compatible with target architecture i386.
warning: `/usr/libexec/ld-elf.so.1': Shared library architecture i386:x86-64 is not
compatible with target architecture i386.

Program received signal SIGSEGV, Segmentation fault.
umtx_unlock (l=void) at /usr/home/dho/go-old/src/pkg/runtime/freebsd/thread.c:72
72  umtx_unlock(Lock *l)
(gdb) p l
$1 = void
(gdb) l
67  
68      goto again;
69  }
70  
71  static void
72  umtx_unlock(Lock *l)
73  {
74      uint32 v;
75  
76  again:
(gdb) bt
#0  umtx_unlock (l=void) at /usr/home/dho/go-old/src/pkg/runtime/freebsd/thread.c:72
#1  0x08076177 in runtime.notesleep (n=void) at
/usr/home/dho/go-old/src/pkg/runtime/freebsd/thread.c:122
#2  0x08070fb1 in nextgandunlock () at /usr/home/dho/go-old/src/pkg/runtime/proc.c:403
#3  0x080713ba in schedule (gp=void) at /usr/home/dho/go-old/src/pkg/runtime/proc.c:572
#4  0x08066ec3 in runtime.mcall (fn=void) at
/usr/home/dho/go-old/src/pkg/runtime/386/asm.s:174
#5  0x38238480 in ?? ()
#6  0x00000000 in ?? ()
(gdb)
@dhobsd
Copy link
Contributor Author

dhobsd commented Jan 10, 2012

Comment 1:

On tip:
[dho@meep /usr/home/dho/go-old/src]$ GOARCH=386 gdb73.1 --args /home/dho/go-old/bin/go
install -a -v std
GNU gdb (GDB) 7.3.1 [GDB v7.3.1 for FreeBSD]
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd8.1".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>;...
Reading symbols from /usr/home/dho/go-old/bin/go...done.
(gdb) r
Starting program: /usr/home/dho/go-old/bin/go install -a -v std
runtime
Program received signal SIGSEGV, Segmentation fault.
nextgandunlock () at /usr/home/dho/go-old/src/pkg/runtime/./proc.c:602
602     if(m->helpgc) {
(gdb) bt
#0  nextgandunlock () at /usr/home/dho/go-old/src/pkg/runtime/./proc.c:602
#1  0x08067a41 in schedule (gp=void) at /usr/home/dho/go-old/src/pkg/runtime/./proc.c:856
#2  0x0806ec3c in runtime.mcall (fn=void) at
/usr/home/dho/go-old/src/pkg/runtime/./asm_386.s:172
#3  0x3825d000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@dhobsd
Copy link
Contributor Author

dhobsd commented Jan 10, 2012

Comment 2:

So it turns out that &m->havenextg is 1:
m 0x826849c
(gdb) x/x 0x826849c+132
0x8268520:  0x00000001
I don't remember where in the world this is set, so I'm a bit lost at the moment.

@dhobsd
Copy link
Contributor Author

dhobsd commented Jan 10, 2012

Comment 3:

Seems like the kernel is killing it somewhere in sys_umtx_op (while it's in the kernel).
Not sure what the deal is here, but I don't think I'm going to be able to install a
debugging kernel tonight anyway. All the addresses going in seem OK.

@adg
Copy link
Contributor

adg commented Jan 11, 2012

Comment 4:

Most frustrating.

Labels changed: added priority-go1, removed priority-triage.

Status changed to HelpWanted.

@robpike
Copy link
Contributor

robpike commented Jan 13, 2012

Comment 5:

Owner changed to builder@golang.org.

@dhobsd
Copy link
Contributor Author

dhobsd commented Jan 13, 2012

Comment 6:

It's also worth noting that in tip, it is the go command that is segfaulting, not cgo
(though cgo still doesn't work and that's what was originally faulting).

@mikioh
Copy link
Contributor

mikioh commented Feb 11, 2012

Comment 8:

I've confirmed that the issue is fixed at 4a0c77722a5e tip.

Status changed to Fixed.

@dhobsd
Copy link
Contributor Author

dhobsd commented Feb 11, 2012

Comment 9:

Are you sure?
[dho@meep ~/go/src]$ GOARCH=386 ./all.bash
...
# Building packages and commands.
runtime
./make.bash: line 67: 23446 Segmentation fault: 11  (core dumped)
../bin/tool/go_bootstrap install -a -v std
[dho@meep ~/go/src]$ hg summ
parent: 11883:4a0c77722a5e tip
 gc: diagnose field+method of same name
branch: default
commit: 10 unknown (clean)
update: (current)
Program received signal SIGSEGV, Segmentation fault.
runtime.exitsyscall () at /home/dho/go/src/pkg/runtime/proc.c:956
956 runtime·exitsyscall(void)
(gdb) bt
#0  runtime.exitsyscall () at /home/dho/go/src/pkg/runtime/proc.c:956
#1  0x08104e97 in syscall.Syscall () at /home/dho/go/src/pkg/syscall/asm_freebsd_386.s:34
#2  0x081077ae in syscall.Read (fd=4, p=..., n=3, err=...) at
/home/dho/go/src/pkg/syscall/zsyscall_freebsd_386.go:810
#3  0x080994e2 in os.(*File).read (f=0x38545540, b=..., n=134748239, err=...) at
/home/dho/go/src/pkg/os/file_unix.go:163
#4  0x08097b2d in os.(*File).Read (f=0x38545540, b=..., n=0, err=...) at
/home/dho/go/src/pkg/os/file.go:60
#5  0x080816c7 in bytes.(*Buffer).ReadFrom (b=0x382e3720, r=..., n=0, err=...) at
/home/dho/go/src/pkg/bytes/buffer.go:153
#6  0x080955be in io.Copy (dst=..., src=..., written=0, err=...) at
/home/dho/go/src/pkg/io/io.go:326
#7  0x0809ed3c in os/exec._func_003 (&w=void, &pr=void, noname=void) at
/home/dho/go/src/pkg/os/exec/exec.go:201
#8  0x383d2765 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

@dhobsd
Copy link
Contributor Author

dhobsd commented Feb 11, 2012

Comment 10:

Also:
Program received signal SIGSEGV, Segmentation fault.
nextgandunlock () at /home/dho/go/src/pkg/runtime/proc.c:604
604     if(m->helpgc) {

@rsc
Copy link
Contributor

rsc commented Feb 13, 2012

Comment 11:

Why does the message say 
Starting program: /usr/home/dho/go-old/bin/cgo -- cgo_unix.go
warning: `/usr/libexec/ld-elf.so.1': Shared library architecture i386:x86-64 is not
compatible with target architecture i386.
warning: `/usr/libexec/ld-elf.so.1': Shared library architecture i386:x86-64 is not
compatible with target architecture i386.
Are you on an x86-64 machine doing a 386 cross-compile?

Status changed to Accepted.

@mikioh
Copy link
Contributor

mikioh commented Feb 13, 2012

Comment 12:

> Are you sure?
Sure, majidesu.
--- cd ../test
0 known bugs; 0 unexpected bugs
ALL TESTS PASSED
---
Installed Go for freebsd/386 in /home/mikioh/go
Installed commands in /home/mikioh/go/bin
vm5% uname -a
FreeBSD vm5.localdomain 8.2-RELEASE-p3 FreeBSD 8.2-RELEASE-p3 #0: Tue Sep 27 18:07:27
UTC 2011     root@i386-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  i386

@mikioh
Copy link
Contributor

mikioh commented Feb 16, 2012

Comment 13:

Hi Devon,
We assume you are using i386 runtime on freebsd/amd64, correct?
I have no experience to make i386 runtime on freebsd/amd64 like
following:
    cd /usr/src; make build32; make install32; ldconfig -v -m -R /usr/lib32.
If so I'm not sure whether it's worth to dive into it.

Labels changed: added os-freebsd.

@dhobsd
Copy link
Contributor Author

dhobsd commented Feb 16, 2012

Comment 14:

Ack, sorry! I missed the updates here in the floods of stuff in my inbox.
Yeah, this is indeed a 386 crossbuild (though for some reason I thought it was a 386
machine). This machine has 32-bit compat installed, but I'm in the process of upgrading
it to RELENG_9 right now, so it'll be a bit before I can test this again. (Previously
was RELENG_8).

@rsc
Copy link
Contributor

rsc commented Feb 20, 2012

Comment 15:

Devon, if you can still produce these crashes on demand, please post a core file and
corresponding binary as an attachment in this issue.  If the Go SIGSEGV handler is
keeping the kernel from creating a core file, please edit
src/pkg/runtime/signals_freebsd.h to change
    /* 11 */    P, "SIGSEGV: segmentation violation",
to
    /* 11 */    0, "SIGSEGV: segmentation violation",
which will keep the Go runtime from trying to handle the signal.
Thanks.
Russ

@dhobsd
Copy link
Contributor Author

dhobsd commented Feb 23, 2012

Comment 16:

In addition to the attached core files, I'm seeing panics with "throw: entersyscall" and
"runtime: split stack overflow". This is basically doing GOARCH=386 ./all.bash. Attached
is a core file. It's actually pretty painful for me to get these due to all the cleanup
that the go tool does, so if you need more information, let me know.
I'm not able to deduce what's going on based on the binary/core file, and for some
reason gdb 7.3.1 stopped working for me with Go programs. (In this case, I get
"/usr/home/dho/go/src/./pkg/time/time.test.core" is not a core dump: File format is
ambiguous, but when running live programs I'll frequently get other things).
I'll keep this core / file around in case there's anything extra you'd like me to do. If
you have any pointers as to what might be going on, that'd be great -- I can probably
fix this, just not sure where to start.
Some relevant info:
[dho@meep ~/go/src]$ gdb73.1 time.test 
GNU gdb (GDB) 7.3.1 [GDB v7.3.1 for FreeBSD]
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd8.1".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>;...
Reading symbols from /usr/home/dho/go/src/time.test...done.
(gdb) r
Starting program: /usr/home/dho/go/src/time.test 
Program received signal SIGSEGV, Segmentation fault.
runtime.exitsyscall () at /usr/home/dho/go/src/pkg/runtime/proc.c:966
966 runtime·exitsyscall(void)
(gdb) bt
#0  runtime.exitsyscall () at /usr/home/dho/go/src/pkg/runtime/proc.c:966
#1  0x0805db42 in timerproc () at /home/dho/go/src/pkg/runtime/time.goc:3488
#2  0x08055292 in schedunlock () at /usr/home/dho/go/src/pkg/runtime/proc.c:259
#3  0x00000000 in ?? ()
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/home/dho/go/src/time.test 
throw: entersyscall
goroutine 1 [syscall]:
goroutine 2 [chan receive]:
testing.RunTests(0x8048c00, 0x81f6f00, 0x2c, 0x2c, 0x81e8101, ...)
    /usr/home/dho/go/src/pkg/testing/testing.go:347 +0x6a7
testing.Main(0x8048c00, 0x81f6f00, 0x2c, 0x2c, 0x81f5cd8, ...)
    /usr/home/dho/go/src/pkg/testing/testing.go:282 +0x46
main.main()
    /tmp/go-build848228700/time/_test/_testmain.go:153 +0x4e
created by _rt0_386
    /usr/home/dho/go/src/pkg/runtime/asm_386.s:80 +0xbe
goroutine 3 [sleep]:
time.Sleep(0x5f5e100, 0x0)
    /usr/home/dho/go/src/pkg/runtime/ztime_386.c:21 +0x4a
time_test.TestSleep(0x3820fb80, 0xe)
    /usr/home/dho/go/src/pkg/time/sleep_test.go:24 +0x69
testing.tRunner(0x3820fb80, 0x81f6f00, 0x0)
    /usr/home/dho/go/src/pkg/testing/testing.go:271 +0x6e
created by testing.RunTests
    /usr/home/dho/go/src/pkg/testing/testing.go:346 +0x687
goroutine 4 [runnable]:
time.Sleep(0x2faf080, 0x0)
    /usr/home/dho/go/src/pkg/runtime/ztime_386.c:21 +0x4a
time_test._func_001()
    /usr/home/dho/go/src/pkg/time/sleep_test.go:20 +0x2b
created by time_test.TestSleep
    /usr/home/dho/go/src/pkg/time/sleep_test.go:22 +0x2d
goroutine 5 [syscall]:
created by addtimer
    /usr/home/dho/go/src/pkg/runtime/ztime_386.c:69
[Inferior 1 (process 54458) exited with code 02]
(gdb) r
Starting program: /usr/home/dho/go/src/time.test 
throw: gosched of g0
goroutine 1 [syscall]:
goroutine 2 [chan receive]:
testing.RunTests(0x8048c00, 0x81f6f00, 0x2c, 0x2c, 0x81e8101, ...)
    /usr/home/dho/go/src/pkg/testing/testing.go:347 +0x6a7
testing.Main(0x8048c00, 0x81f6f00, 0x2c, 0x2c, 0x81f5cd8, ...)
    /usr/home/dho/go/src/pkg/testing/testing.go:282 +0x46
main.main()
    /tmp/go-build848228700/time/_test/_testmain.go:153 +0x4e
created by _rt0_386
    /usr/home/dho/go/src/pkg/runtime/asm_386.s:80 +0xbe
goroutine 3 [runnable]:
time.Sleep(0x5f5e100, 0x0)
    /usr/home/dho/go/src/pkg/runtime/ztime_386.c:21 +0x4a
time_test.TestSleep(0x3820fb80, 0xe)
    /usr/home/dho/go/src/pkg/time/sleep_test.go:24 +0x69
testing.tRunner(0x3820fb80, 0x81f6f00, 0x0)
    /usr/home/dho/go/src/pkg/testing/testing.go:271 +0x6e
created by testing.RunTests
    /usr/home/dho/go/src/pkg/testing/testing.go:346 +0x687
goroutine 5 [syscall]:
created by addtimer
    /usr/home/dho/go/src/pkg/runtime/ztime_386.c:69
[Inferior 1 (process 54460) exited with code 02]

Attachments:

  1. go_test_core.tar.bz2 (847188 bytes)

@rsc
Copy link
Contributor

rsc commented Feb 23, 2012

Comment 17:

The core you posted is dying in nextgandunlock after a call to notesleep returns:
nextgandunlock+0x15e 0x08055a62 MOVL    GS:fffffffc,AX
nextgandunlock+0x165 0x08055a69 ADDL    $84,AX
nextgandunlock+0x16a 0x08055a6e MOVL    AX,0(SP)
nextgandunlock+0x16d 0x08055a71 CALL    runtime.notesleep(SB)
nextgandunlock+0x172 0x08055a76 MOVL    GS:fffffffc,AX         <<<<<
nextgandunlock+0x179 0x08055a7d MOVL    74(AX),AX
nextgandunlock+0x17c 0x08055a80 CMPL    AX,$0
This strongly suggests that the thread-local storage is being reset or otherwise
mishandled.  The fault is _reading_ the thread-local storage word, not _using_ it.  So
it is like our thread-local storage disappeared completely!  Does FreeBSD have cgo?  I
wonder if it is messing things up.  TLS mishaps causing g not to point at a G structure
would explain the throw("entersyscall") and the runtime split stack overflow failures
too.
Maybe it would make sense to try to use thr_new's tls_base instead of doing it ourselves
in the new threads.  Note that for bizarre ELF reasons, tls_base points _after_ the tls
section.  So you'd want to try making m->tls be an array of void*, then set tls[0] =
g and tls[1] = m and then use &tls[2] as tls_base in the thr_new parameters.

@minux
Copy link
Member

minux commented Feb 25, 2012

Comment 18:

Yes, FreeBSD has cgo support.
For proper tls handling on amd64, feel free to use
http://golang.org/cl/5689065/ .
I haven't have time to finish the same for FreeBSD/386.

@robpike
Copy link
Contributor

robpike commented Feb 26, 2012

Comment 19:

Issue #3115 has been merged into this issue.

@dhobsd
Copy link
Contributor Author

dhobsd commented Feb 28, 2012

Comment 20:

Op 25 februari 2012 13:08 heeft  <go@googlecode.com> het volgende geschreven:
The change you made in 5689065 works for me on amd64. I'm still unable
to get an i386 version of this put together -- partially because I
don't have a FreeBSD/i386 machine, and also partially because
cross-compiling and running the 386 binary on the amd64 machine just
doesn't work for me when I try to set it up "properly."
I'd definitely appreciate input / suggestions for how to go about
this, because the "straightforward" fix doesn't seem to work and my
knowledge of i386 is significantly worse than my knowledge of amd64.
--dho

@rsc
Copy link
Contributor

rsc commented Feb 28, 2012

Comment 21:

Can we reproduce this bug on a machine I can ssh into?
I would be happy to debug this once things calm down a little.

@dhobsd
Copy link
Contributor Author

dhobsd commented Feb 28, 2012

Comment 22:

I'm happy to make you an account on my machine. Can you email me a key?

@rsc
Copy link
Contributor

rsc commented Mar 1, 2012

Comment 23:

This happens on a 64-bit system compiling with GOARCH=386.  It looks like somehow the
tls pointer is being set to &m->tls[0] instead of &m->tls[0] + 2*sizeof(uintptr), at
least if uc->uc_mcontext.gsbase is to believed.  This would happen if setldt were being
ignored and thr_new's param.tls_base were used instead.  However, I tried fixing
param.tls_base and commenting out the settls in thr_start and that did not fix anything.
Since this only happens on a cross-compile, I think this can wait until after Go 1.

Labels changed: added priority-later, removed priority-go1.

@dhobsd
Copy link
Contributor Author

dhobsd commented Mar 1, 2012

Comment 24:

Op 1 maart 2012 00:55 heeft  <go@googlecode.com> het volgende geschreven:
Glad to see that I'm not crazy -- this is exactly the behavior I was
seeing when I did the same :\. Oh well, I suppose it's definitely
something that can wait until later.

@minux
Copy link
Member

minux commented Dec 18, 2012

Comment 27:

Issue #3452 has been merged into this issue.

@rsc
Copy link
Contributor

rsc commented Mar 12, 2013

Comment 28:

[The time for maybe has passed.]

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 29:

I doubt we will ever fix this.

Labels changed: added priority-someday, removed priority-later.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 30:

Labels changed: added repo-main.

@rsc
Copy link
Contributor

rsc commented Mar 3, 2014

Comment 31:

Adding Release=None to all Priority=Someday bugs.

Labels changed: added release-none.

@gopherbot
Copy link

Comment 32 by steve.wills:

I'm seeing this too, any chance we can make progress on it? I can provide access to
troubleshoot if needed.

@rsc
Copy link
Contributor

rsc commented May 31, 2014

Comment 33:

FWIW the FreeBSD team has just committed two fixes related to running i386 LDT code on
amd64 kernels.
http://svnweb.freebsd.org/base?view=revision&revision=266846
http://svnweb.freebsd.org/base?view=revision&revision=266901
It is possible these fix the problem. There is the beginning of a discussion here:
http://lists.freebsd.org/pipermail/freebsd-amd64/2014-May/thread.html#16093
I do not know whether or when these patches will hit earlier versions of FreeBSD.
We will fix the one error-checking problem identified on that thread, but it's
minor compared to the FreeBSD fixes.

@gopherbot
Copy link

Comment 34:

CL https://golang.org/cl/99680044 mentions this issue.

@rsc
Copy link
Contributor

rsc commented May 31, 2014

Comment 35:

This issue was updated by revision 19c8f67.

The code here was using the error check for Linux/386,
not the one for FreeBSD/386. Most of the time it worked.
Thanks to Neel Natu (FreeBSD developer) for finding this.
The s/JCC/JAE/ a few lines later is a no-op but makes the
test match the rest of the file. Why we write JAE instead of JCC
I don't know, but the two are equivalent and the file might
as well be consistent.
LGTM=bradfitz, minux
R=golang-codereviews, bradfitz, minux
CC=golang-codereviews
https://golang.org/cl/99680044

@rsc
Copy link
Contributor

rsc commented Jun 5, 2014

Comment 36:

Neel Natu tells me that both of the FreeBSD fixes will be merged into the FreeBSD stable
branches in the next 2-3 weeks.
Can someone please post a comment once you've seen the binaries working on a FreeBSD
stable kernel? Thanks.

Owner changed to @rsc.

@gopherbot
Copy link

@mikioh
Copy link
Contributor

mikioh commented Jun 3, 2015

Go 1.5 on FreeBSD from FreeBSD stable branches allows to run GOARCH=386 CGO_ENABLED=1 make.bash on freebsd/amd64 host. I just confirmed it on freebsd-amd64 10.1-RELEASE-p10 and freebsd9-amd64 9.3-RELEASE-p13.

@mikioh mikioh closed this as completed Jun 3, 2015
@mikioh mikioh modified the milestones: Go1.5, Unplanned Jun 3, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
@rsc rsc removed their assignment Jun 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants