Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: multithreaded CPU profiles incorrect on NetBSD #6047

Closed
rsc opened this issue Aug 5, 2013 · 26 comments
Closed

runtime/pprof: multithreaded CPU profiles incorrect on NetBSD #6047

rsc opened this issue Aug 5, 2013 · 26 comments

Comments

@rsc
Copy link
Contributor

rsc commented Aug 5, 2013

Multithreaded CPU profiles do not work on OS X, but worse they give little indication
that they are broken.
Single-threaded profiles kind of work, but only because we play games changing the
signal mask every time we start executing Go code, so we incur overhead on every thread
switch, and that in turn is suspected of causing deadlocks (see issue #5519). Also, many
times you can't quite be sure you hit the single-threaded case, so the profiles are not
truly believable. For years I have pulled out my Linux laptop when I want to do real
performance work with profiles.

The fundamental problem is that OS X delivers the profiling signals to the wrong thread
(golang.org/change/35b716c94225). The signal mask trick is an attempt to work around
this, but it's incomplete and misleading.

I intend to remove the workarounds from package runtime, so that CPU profiles on OS X
will typically be empty, because the wrong thread will receive the signals. That will
avoid giving out inaccurate information. It should also fix issue #5519 and simplify the
scheduler.

The correct place to fix this bug is in the OS X kernel. I filed an Apple Bug Report in
2011 (again see golang.org/change/35b716c94225) but the problem remains.

For people who insist on accurate profiling on OS X and are a bit adventurous, it may be
possible to modify the kernel to deliver the profiling signal correctly. See
http://godoc.org/code.google.com/p/rsc/cmd/pprof_mac_fix for details, and heed the
warnings in its documentation.
@dvyukov
Copy link
Member

dvyukov commented Aug 5, 2013

Comment 1:

I agree that it's broken, setprof calls are disturbing, and if the profile looks OK you
are still not sure.
Can't we do something like what windows does for profiling -- a dedicated thread that
periodically queries state of all other threads?

@rsc
Copy link
Contributor Author

rsc commented Aug 5, 2013

Comment 2:

# Can't we do something like what windows does for profiling -- a dedicated
thread that periodically queries state of all other threads?
Not easily. I do hope that removing the workaround will prompt people to
think about working solutions, just like what happened with "stack
unavailable" in the goroutine dumps.
I believe a dedicated thread can be made to work, but I also believe it
requires using the Mach API to query the thread status, and we don't have
an easy way from within the runtime to do that. I started looking into that
(libmach/darwin.c has some code; look for thread_suspend and so on) but I
am not sure that it can be used easily from within the running process, and
I am not sure whether all the calls we need are available as direct system
calls or whether we'd have to wrap more of the Mach message passing layer.
We can't use task_suspend because it would suspend the thread doing the
profiling.
My kernel fix needs some cleanup and testing before others should try it,
but it is a significantly smaller change and seems to be reliable.

@rsc
Copy link
Contributor Author

rsc commented Aug 5, 2013

Comment 3:

This issue was closed by revision d3066e4.

Status changed to Fixed.

@rsc
Copy link
Contributor Author

rsc commented Aug 6, 2013

Comment 4:

Reopening so that it will appear in search results and to acknowledge that the bug is
not fixed.
NetBSD and OpenBSD are broken more or less the same way OS X is. This is not super
surprising given their shared code, but it is interesting nonetheless. Apparently all of
them added threads to the kernel without updating the profiling signal delivery to be
thread-aware. (FreeBSD is fine at least, but they were one of the earliest to support
threads.)
The primary focus of the bug is still OS X. NetBSD and OpenBSD can and should fix their
kernels. I have not submitted any reports to them, but if anyone wants to do so, the
test program you need is probably in the Apple Bug Report
(http://golang.org/change/35b716c94225).

Labels changed: removed go1.2.

Status changed to Accepted.

@gopherbot
Copy link

Comment 5 by n.bruenggel:

Bah and I tried to get a meaningful profile on my Mac!

@rsc
Copy link
Contributor Author

rsc commented Nov 4, 2013

Comment 6:

If you want to get accurate profiles on a Mac, see the first comment.
http://godoc.org/code.google.com/p/rsc/cmd/pprof_mac_fix
http://research.swtch.com/macpprof

@gopherbot
Copy link

Comment 7 by n.bruenggel:

Yeah I am not going to break the kernel on my Mac, I need it to work. Maybe I'll install
Linux on my PC sometimes.

@nf
Copy link

nf commented Nov 5, 2013

Comment 8:

As far as I am aware the aforementioned fix has never broken anyone's kernel, and many
people (myself included) have used it. 
With that said you might consider using vagrant on your Mac to easily set up a Linux VM.

@davecheney
Copy link
Contributor

Comment 9:

Profiling inside a VM ? Now you have two problems ...

@rsc
Copy link
Contributor Author

rsc commented Nov 27, 2013

Comment 10:

Labels changed: added go1.3maybe.

@4a6f656c
Copy link
Contributor

4a6f656c commented Dec 2, 2013

Comment 11:

FTR this has been fixed in the OpenBSD kernel - the fix will be included in the OpenBSD
5.5 release (May 2014) and is already available in -current.

@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 12:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 13:

Labels changed: added repo-main.

@gopherbot
Copy link

Comment 14 by justin@specialbusservice.com:

I cannot reproduce the test case on the OSX report on NetBSD 5-6 or indeed on OpenBSD
5.4 - does anyone have a test that eg fails on Openbsd 5.4 and succeeds on 5.5 so I can
look at fixing NetBSD?

@4a6f656c
Copy link
Contributor

Comment 15:

OpenBSD now has a regress test for this - this fails on OpenBSD 5.4 and probably fails
on NetBSD (although I've not tried):
http://www.openbsd.org/cgi-bin/cvsweb/src/regress/sys/kern/sigprof/sigprof.c?rev=1.1;content-type=text%2Fplain

@gopherbot
Copy link

Comment 16 by justin@specialbusservice.com:

That test does fail on NetBSD 6, but it also fails on Linux (Ubuntu 13.10), although the
distribution is much better than on NetBSD (which is not as bad as OpenBSD 5.4). I need
to install OpenBSD 5.5 to compare. So not sure its a great test, but indicative...

@4a6f656c 4a6f656c changed the title runtime/pprof: multithreaded CPU profiles incorrect on NetBSD, OpenBSD, OS X runtime/pprof: multithreaded CPU profiles incorrect on NetBSD, OS X Dec 29, 2014
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@adonovan
Copy link
Member

Until this is fixed, can we make runtime.StartCPUProfiling print a message to stderr on Mac OS X to warn users that profiling is broken?

@rsc
Copy link
Contributor Author

rsc commented Aug 12, 2015

@adonovan, profiling works fine on OS X if you patch your kernel, as I think most Go programmers on Macs do, out of necessity. Certainly people running non-buggy kernels don't need to see forced output on standard error every time they profile a program.

If we had good way to tell whether the kernel patch has been applied, we could print a warning in that case. But I don't have a good way to do that. I have thought about changing the kernel version string but the only thing I am confident about changing is the date, and there's not much room there to signal that the fix is applied.

See rsc.io/pprof_mac_fix for the patch.

@pires
Copy link

pires commented Aug 15, 2015

@rsc thanks for the patch. Pray for me while I run it!

@rsc
Copy link
Contributor Author

rsc commented Aug 28, 2015

I have a report from an OS X 10.11 El Capitan beta user that pprof works out of the box on that system, without the need for a kernel patch. I have also inspected the machine code for the relevant kernel function, and they did make changes roughly along the lines of what the patch has always done. So I believe it was intentionally fixed. I am hopeful that the fix will last into the final public release of OS X 10.11 El Capitan. And then maybe years from now we can look back and laugh at how ridiculous it was that we had to apply a binary patch to our kernels to profile our programs.

Does anyone know: is NetBSD still broken?

@davecheney
Copy link
Contributor

And there was much rejoicing and dancing in the streets.

On Fri, 28 Aug 2015 10:30 Russ Cox notifications@github.com wrote:

I have a report from an OS X 10.11 El Capitan beta user that pprof works
out of the box on that system, without the need for a kernel patch. I have
also inspected the machine code for the relevant kernel function, and they
did make changes roughly along the lines of what the patch has always done.
So I believe it was intentionally fixed. I am hopeful that the fix will
last into the final public release of OS X 10.11 El Capitan. And then maybe
years from now we can look back and laugh at how ridiculous it was that we
had to apply a binary patch to our kernels to profile our programs.

Does anyone know: is NetBSD still broken?


Reply to this email directly or view it on GitHub
#6047 (comment).

@minux
Copy link
Member

minux commented Aug 28, 2015 via email

@jnjackins
Copy link
Contributor

Confirming that profiling does indeed work out-of-the-box on El Capitan.

@rsc rsc changed the title runtime/pprof: multithreaded CPU profiles incorrect on NetBSD, OS X runtime/pprof: multithreaded CPU profiles incorrect on NetBSD Jan 6, 2016
@rsc
Copy link
Contributor Author

rsc commented Jan 6, 2016

Obsoleting in favor of #13841.

@rsc rsc closed this as completed Jan 6, 2016
@alandonovan
Copy link
Contributor

Yay!

@gopherbot
Copy link

CL https://golang.org/cl/19161 mentions this issue.

@golang golang locked and limited conversation to collaborators Feb 3, 2017
gopherbot pushed a commit that referenced this issue Feb 19, 2021
macOS tests have been disabled since CL 12429045 (Aug 2013).
At the time, macOS required a kernel patch to get a working profiler
(https://research.swtch.com/macpprof), which we didn't want
to require, of course.

macOS has improved - it no longer requires the kernel patch - but
we never updated the list of exceptions.

As far as I can tell, the builders have no problem passing the pprof test now.
(It is possible that the iOS builders have trouble, but that is now a different GOOS.)

Remove the exception for macOS. The test should now pass.

Fixes #6047.

Change-Id: Iab49036cacc1025e56f515bd19d084390c2f5357
Reviewed-on: https://go-review.googlesource.com/c/go/+/292229
Trust: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests