Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc/progs: sporadic sig 11 on arm #4305

Closed
davecheney opened this issue Oct 29, 2012 · 16 comments
Closed

doc/progs: sporadic sig 11 on arm #4305

davecheney opened this issue Oct 29, 2012 · 16 comments
Milestone

Comments

@davecheney
Copy link
Contributor

What steps will reproduce the problem?

cgo programs have started to fail on arm

# ../doc/progs
go build command-line-arguments: signal 11

What is the expected output? What do you see instead?

Tests Pass

Please use labels and text to provide additional information.


pando(~/go/src) % uname -a
Linux pando 3.2.0-1420-omap4 #27-Ubuntu SMP PREEMPT Fri Sep 28 16:21:51 UTC 2012 armv7l
armv7l armv7l GNU/Linux
pando(~/go/src) % cat /proc/cpuinfo 
Processor       : ARMv7 Processor rev 3 (v7l)
processor       : 0
BogoMIPS        : 596.46

processor       : 1
BogoMIPS        : 582.68

Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x1
CPU part        : 0xc09
CPU revision    : 3

Hardware        : OMAP4 Panda board
Revision        : 0020
Serial          : 0000000000000000

No sign on an OOM killer or other outside interference on this process.
@minux
Copy link
Member

minux commented Oct 29, 2012

Comment 2:

i think we can modify doc/progs/run to output some progress indication,
and then try to reproduce the problem. I don't quite understand what is
triggering SEGSEGV here.

@davecheney
Copy link
Contributor Author

Comment 3:

SGTM. If you prepare the CL I can run doc/progs in a loop til it breaks.

@minux
Copy link
Member

minux commented Oct 29, 2012

Comment 4:

just add a line "set -x" to doc/prog/run.

@davecheney
Copy link
Contributor Author

Comment 5:

I believe this is not a problem with go code, but a segfault in 5g or
5l, as doc/progs never runs the code, it only compiles.
On this machine CC is set to /usr/bin/clang-3.0, so it could be a clang bug.
I have been running doc/progs/run in a loop for two hours now without
a crash, this might be quiet subtle.

@minux
Copy link
Member

minux commented Oct 29, 2012

Comment 6:

maybe we can modify cmd/go to display the exact command that is failing?
btw, doc/progs/run does run some of the compiled binaries and test
their outputs.

@davecheney
Copy link
Contributor Author

Comment 7:

Since replacing the linux/arm builder in question, there have been no incidents. I will
keep this issue open for another few weeks, then close if nothing shows up.

@davecheney
Copy link
Contributor Author

Comment 8:

I'm putting this one down to the Pandaboard not being awesome.

Status changed to Retracted.

@minux
Copy link
Member

minux commented Dec 20, 2012

Comment 9:

http://build.golang.org/log/00600fa74b379cb566c2b63ae5736e6f06aa9a98

Status changed to Accepted.

@davecheney
Copy link
Contributor Author

Comment 10:

Removing myself as a owner.
As a datapoint the current linux-arm-cheney builder, an omap3 freescale iMX53 board has
never failed a build in this manner -- however this is a single core machine. 
I am considering running ./run.bash in a loop on my nexus7 (if I can figure out a way to
minimise the flash writes) to gather another data point on if this is caused by a data
race, or if it is just pandaboards being flaky under load.

Owner changed to ---.

@davecheney
Copy link
Contributor Author

Comment 11:

I have been running variations of ./run.bash --no-rebuild, ./run.bash, and ./all.bash in
a loop on my pandaboard and nexus 7 for the last 24 hours at revision
% hg id /tmp/go
019884311591+ tip
without incident.
@minux, are you able to try again with your pandaboard ?

@rsc
Copy link
Contributor

rsc commented Dec 30, 2012

Comment 12:

Labels changed: added priority-later, removed priority-triage.

@davecheney
Copy link
Contributor Author

Comment 13:

Lowering the priority to go1.1maybe, this issue has not reoccured since pandaboards were
removed from the equation.

Labels changed: added go1.1maybe, removed go1.1.

@davecheney
Copy link
Contributor Author

Comment 14:

After some more debugging, it appears the common factor is at a minimum, Ubuntu Linux
11.10 - 12.04 and a pandaboard. Replacing the operating system with arch linux has
produced a builder which has performed flawlessly in soak tests and proven to be a very
stable builder. 
The remaining question is what in the Ubunut linux + pandaboard combination causes the
strange segfaults. For the moment I will add a note to the builder page that ubuntu
11.10/12.04 OMAP4 kernels are not recommended as builders.

@minux
Copy link
Member

minux commented Feb 28, 2013

Comment 15:

i want to do the following experiment:
1. patch cmd/dist to build static binary
2. compile on affected ubuntu
3. copy whole $GOROOT to arch
4. do extensive test on arch with the $GOROOT.
I hope this could isolate the problem to either
libc/toolchain, or the kernel.
In the past, I do see broken gcc that miscompile
our compiler.

@davecheney
Copy link
Contributor Author

Comment 16:

I am very confident that we have isolated the fix -- older versions of the ubuntu OMAP4
kernel, and demonstrated that the fix is to apt-get upgrade to the latest available
kernel (12.04.2). 
I am marking this as fixed, and sincerely hope it will not be reopened again.

Status changed to Fixed.

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1maybe label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants