runtime: CPU profiling and fork/exec go into infinite loop #5517

remyoudompheng · 2013-05-20T09:20:28Z

What steps will reproduce the problem?
1. Compile and run the following program

package main

import (
    "crypto/md5"
    "log"
    "net/http"
    _ "net/http/pprof"
    "os/exec"
    "time"
)

func main() {
    go work()
    err := http.ListenAndServe(":9000", nil)
    if err != nil {
        log.Fatal(err)
    }
}

var garbage = make([][]byte, 32)

func work() {
    for i := range garbage {
        garbage[i] = make([]byte, 20<<20)
    }
    tick := time.NewTicker(time.Second / 30)
    for _ = range tick.C {
        for i, s := range garbage {
            for j := range s {
                s[j] = byte(i+j)
            }
        }
        cmd := exec.Command("ls")
        blah, err := cmd.CombinedOutput()
        if err != nil {
            log.Fatal(err)
        }
        h := md5.New()
        h.Write(blah)
        log.Printf("%x", h.Sum(nil))
    }
}

2. Run go tool pprof http://localhost:9000/debug/pprof/profile

What is the expected output? What do you see instead?

Expected: the programs gets profiled.

Instead: curl times out, the program remains stuck in an infinite loop.

Read http://localhost:9000/debug/pprof/symbol
Gathering CPU profile from http://localhost:9000/debug/pprof/profile?seconds=30 for 30
seconds to
  /home/remy/pprof/crash.1369041399.localhost-port9000
Be patient...
Failed to get profile: curl --max-time 90 -s -o
/home/remy/pprof/.tmp.crash.1369041399.localhost-port9000
'http://localhost:9000/debug/pprof/profile?seconds=30': 

strace output:

[pid 24975] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24975] fork( <unfinished ...>
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24975] <... fork resumed> )        = ? ERESTARTNOINTR (To be restarted)
<0.048847>
[pid 24975] fork( <unfinished ...>
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24975] <... fork resumed> )        = ? ERESTARTNOINTR (To be restarted)
<0.047955>
[pid 24975] fork( <unfinished ...>
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24972] --- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
[pid 24975] <... fork resumed> )        = ? ERESTARTNOINTR (To be restarted)
<0.048301>
[pid 24975] fork( <unfinished ...>

Which version of Go am I using?

The problem happens with Go 1.0.3 and Go 1.1. When I run strace on the Go 1.1 binary,
the program goes out of the infinite loop after a few seconds. On Go 1.0.3, the program
remains stuck.

remyoudompheng · 2013-05-20T09:21:15Z

Comment 1:

The garbage in the example code is meant to make the fork longer (20-40ms here), you may
need to increase the garbage size in order to reproduce.

remyoudompheng · 2013-05-20T09:23:22Z

Comment 2:

I forgot system information: this is on Linux amd64
At home I use Archlinux:
$ uname -a
Linux mastermind 3.8.0-1-mastermind #1 SMP PREEMPT Thu Feb 21 10:34:25 CET 2013 x86_64
GNU/Linux
I also encounter the issue on Debian old stable (squeeze) which runs Linux 2.6.32.

dvyukov · 2013-06-16T15:35:33Z

Comment 3:

Simpler reproducer:
package main
import (
    "fmt"
    "os/exec"
    "time"
)
//var ballast = make([][]byte, 2e8)
func main() {
    go func() {
        prev := time.Now()
        for now := range time.NewTicker(time.Millisecond).C {
            fmt.Println(now.Sub(prev))
            prev = now
        }   
    }()
    for {
        exec.Command("ls").CombinedOutput()
    }
}
With the ballast there are that 15ms delays:
999.337us
996.273us
1.001548ms
1.003009ms
1.000131ms
1.022096ms
15.632691ms
321.769us
995.469us
999.496us
1.000067ms
999.972us
1.000004ms
1.000169ms
999.43us
16.426056ms
583.624us
990.718us
1.012777ms
987.411us
999.783us
999.792us
1.000195ms
999.824us
15.317368ms
687.186us
995.589us
999.954us

dvyukov · 2013-06-16T15:36:27Z

Comment 4:

The problem is that fork on linux uses RawSyscall, so it completely blocks the process.
One solution may be to carefully use vfork, it should be much faster.
Another solution may be to carefully use entersyscall/exitsyscall
around fork, but the child process must not execute exitsyscall
because it can deadlock.

alberts · 2013-07-08T06:00:34Z

Comment 5:

Seems related to issue #5838. Will fork with whole heap marked MADV_DONTFORK also be
faster?

rsc · 2013-07-30T18:05:40Z

Comment 6:

Labels changed: added priority-later, go1.2, removed priority-triage.

dvyukov · 2013-07-31T10:46:27Z

Comment 7:

Owner changed to @dvyukov.

dvyukov · 2013-07-31T17:17:13Z

Comment 8:

comments #3 and #4 were intended for issue #5717

dvyukov · 2013-07-31T17:30:19Z

Comment 9:

funny, if you preempt clone() with signals frequently enough, it... never finishes:
https://code.google.com/p/gperftools/issues/detail?id=278

dvyukov · 2013-08-08T14:04:09Z

Comment 10:

Status changed to Started.

dvyukov · 2013-08-13T09:01:49Z

Comment 11:

This issue was closed by revision e33e476.

Status changed to Fixed.

mdempsky · 2015-02-27T04:45:27Z

Would simply using pthread_sigmask() to temporarily block SIGPROF while calling clone() have worked too? Or would generating SIGPROF and/or interrupting another thread still have caused clone() failures?

Could there be any profiling inaccuracies if CPU profiling timers continue to countdown during clone(), but instead the SIGPROF signals are sent to a different thread? Would it be worse than the status quo where we disable SIGPROF entirely and then re-enable it?

I ask because technically I think right now calling syscall.ForkExec and runtime.SetCPUProfileRate concurrently could cause problems, as they both try to enable/disable the process-wide CPU profiler. This can be fixed with a locking protocol, of course, but it seems like a slightly simpler solution would be to make SetCPUProfileRate exclusively responsible for modifying the process-wide CPU profiler state, and have ForkExec just modify the thread-local SIGPROF mask state.

dvyukov · 2015-02-27T10:37:48Z

Would simply using pthread_sigmask() to temporarily block SIGPROF while calling clone() have worked too?

I don't remember. I should have been added a test for that issue.

remyoudompheng added fixed labels Aug 13, 2013

rsc added this to the Go1.2 milestone Apr 14, 2015

rsc removed the go1.2 label Apr 14, 2015

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: CPU profiling and fork/exec go into infinite loop #5517

runtime: CPU profiling and fork/exec go into infinite loop #5517

remyoudompheng commented May 20, 2013

remyoudompheng commented May 20, 2013

remyoudompheng commented May 20, 2013

dvyukov commented Jun 16, 2013

dvyukov commented Jun 16, 2013

alberts commented Jul 8, 2013

rsc commented Jul 30, 2013

dvyukov commented Jul 31, 2013

dvyukov commented Jul 31, 2013

dvyukov commented Jul 31, 2013

dvyukov commented Aug 8, 2013

dvyukov commented Aug 13, 2013

mdempsky commented Feb 27, 2015

dvyukov commented Feb 27, 2015

runtime: CPU profiling and fork/exec go into infinite loop #5517

runtime: CPU profiling and fork/exec go into infinite loop #5517

Comments

remyoudompheng commented May 20, 2013

remyoudompheng commented May 20, 2013

remyoudompheng commented May 20, 2013

dvyukov commented Jun 16, 2013

dvyukov commented Jun 16, 2013

alberts commented Jul 8, 2013

rsc commented Jul 30, 2013

dvyukov commented Jul 31, 2013

dvyukov commented Jul 31, 2013

dvyukov commented Jul 31, 2013

dvyukov commented Aug 8, 2013

dvyukov commented Aug 13, 2013

mdempsky commented Feb 27, 2015

dvyukov commented Feb 27, 2015