Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: limit number of operating system threads #4056

Closed
rsc opened this issue Sep 9, 2012 · 35 comments
Closed

runtime: limit number of operating system threads #4056

rsc opened this issue Sep 9, 2012 · 35 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Sep 9, 2012

It comes up repeatedly that programs with large numbers of goroutines spawn far more
operating system threads than they can reasonably use, because the Go runtime is trying
not to find itself in a situation where every thread is blocked in the operating system.
Unfortunately this means that if something in the OS gets backed up Go just keeps making
more threads as more goroutines get stuck there. For example if DNS queries (done via
cgo) get stuck for a little while, then a program with 5000 HTTP-fetching goroutines
will end up with 5000 threads attempting DNS queries. The usual "solution" is
for callers of cgo code to limit by hand the number of goroutines entering that code.

I wonder if the runtime should expose a setting giving the maximum number of OS threads
to use. 256 or 512 could be a reasonable default - those seem quite high and all the
trouble I have seen has been with far more threads. It would probably be like 

// MaxOSThreads sets the maximum number of OS threads that will
// be used to run the current Go program and returns the previous setting.
// If n < 1, it does not change the current setting. The default is 256.
func MaxThreads(n int) int
@bradfitz
Copy link
Contributor

Comment 1:

Seems like you'd want to cap thread creation depending on the use.
If I have 256 blocked cgo DNS threads, that doesn't mean that a new thread to Read from
disk wouldn't make progress.
I understand that's more complex, though.  So I guess it could always be done by hand,
as it is now.

@rsc
Copy link
Contributor Author

rsc commented Sep 10, 2012

Comment 2:

Yes, you can always raise or remove the limit and do things by hand. I
just think the default should be limited, not unlimited.
Russ

@ianlancetaylor
Copy link
Contributor

Comment 3:

What if GOMAXPROCS > MaxOSThreads?
Do we want to limit the number of threads created to run cgo/SWIG, or do we want to
limit the total number of threads including those created to run goroutines that make
blocking system calls?
An absolute limit on all threads is easy to implement and understand but I'm concerned
that 256 is too low.

@dvyukov
Copy link
Member

dvyukov commented Sep 10, 2012

Comment 4:

Thread pool with bounded number of threads known to cause problems. The first problem is
system-induced deadlocks. And it's not something that you will uncover during
unit-testing. Then, it may cause poor performance, e.g. I may have X threads almost
permanently blocked and I do not want them to affect the rest of the program. Then, it's
difficult for a user to set it correctly; so users may change the default when they do
not need to, or don't change when need to.
The new runtime partially solves the problem by "ignoring" short syscalls. On network
benchmarks with GOMAXPROCS=16 I see only about 25 threads. It may be further tuned by
increasing delay before cpu retake if there are already a lot of threads. If we are
committing the new runtime, then I would prefer to see whether it makes things better or
not before introducing user control.
Do we have a good reproducer for the problem?

@rsc
Copy link
Contributor Author

rsc commented Sep 10, 2012

Comment 5:

Yes, I am aware that bounded thread pools cause problems. However,
unbounded ones cause problems too. At the moment I would be happy to
trade the bounded problems for the unbounded problems.

@dvyukov
Copy link
Member

dvyukov commented Sep 10, 2012

Comment 6:

But that's not backwards-compatible change.
What about having unbounded number by default with an option to set the bound if
required?

@ianlancetaylor
Copy link
Contributor

Comment 7:

I'm not worried about this sort of backward compatibility.

@rsc
Copy link
Contributor Author

rsc commented Sep 10, 2012

Comment 8:

We're allowed to break compatibility for bug fixes. It is a bug that
if you have 10000 goroutines doing HTTP requests, a hiccup in your DNS
server triggers 10000 OS threads.
Russ

@dvyukov
Copy link
Member

dvyukov commented Sep 10, 2012

Comment 9:

But it will break other programs that are not affected by the "bug".

@rsc
Copy link
Contributor Author

rsc commented Sep 11, 2012

Comment 10:

That's why we have to provide a workaround API. If we set the default well
we will fix more programs than we break.

@rsc
Copy link
Contributor Author

rsc commented Sep 12, 2012

Comment 12:

Labels changed: added go1.1.

@bradfitz
Copy link
Contributor

bradfitz commented Nov 8, 2012

Comment 13:

Related: discussion on http://golang.org/cl/6815049 about number of threads
blocked in cgo DNS lookups.  Cap that too in the net package?

@rsc
Copy link
Contributor Author

rsc commented Dec 10, 2012

Comment 14:

Labels changed: added size-m.

@rsc
Copy link
Contributor Author

rsc commented Dec 10, 2012

Comment 15:

Labels changed: added suggested.

@gopherbot
Copy link

Comment 16 by rickarnoldjr:

Attempted fix: https://golang.org/cl/7275049/

@rsc
Copy link
Contributor Author

rsc commented Feb 15, 2013

Comment 17:

This may fall out of some new scheduler work going on.

Labels changed: removed suggested.

@dvyukov
Copy link
Member

dvyukov commented Feb 15, 2013

Comment 18:

Do we have a reproducer?
I can test how it works with my scheduler change.

@rsc
Copy link
Contributor Author

rsc commented Feb 19, 2013

Comment 19:

One way to reproduce it is to write a cgo wrapper that calls C.sleep(3600)
and kick off a thousand of them.
Russ

@dvyukov
Copy link
Member

dvyukov commented Mar 7, 2013

Comment 21:

The number of threads is not limited yet, but I think the situation may be better now.
The new scheduler is more conservative with thread creation.

@rsc
Copy link
Contributor Author

rsc commented Mar 12, 2013

Comment 23:

Let's put this off until after Go 1.1. It's not super important, and it's very easy to
get wrong in bad ways. There's enough going on in the scheduler already.

Labels changed: added go1.2, removed go1.1.

@gopherbot
Copy link

Comment 24 by download333:

I had been under the impression that the Go runtime used something similar to libuv or
libevent to do non-blocking IO without the use of a threadpool. Do you mean that Go
spawns a separate thread for every blocking IO call? Wouldn't that defeat the
performance advantages of goroutines, or am I missing something?

@bradfitz
Copy link
Contributor

bradfitz commented Apr 3, 2013

Comment 25:

When epoll/kqueue/completion ports can be used (e.g. for network), Go uses that.  Yes,
like libevent and friends do.
But a pool of operating system threads are maintained for running goroutines and doing
blocking system calls that can't use epoll/etc.

@gopherbot
Copy link

Comment 26 by download333:

In reference to this a post
(https://groups.google.com/d/msg/golang-nuts/jgNKl0Jap_k/BeVBUAuNcBkJ) from the group,
here are some error logs that have stack traces just in case anyone finds them useful
for later optimizations.
The code and logs are from a modified version of Webfront I was using to serve several
sites off the same server.
https://dl.dropbox.com/u/27496904/frontend/frontend.go
https://dl.dropbox.com/u/27496904/frontend/errlogbak
https://dl.dropbox.com/u/27496904/frontend/errlogbak2
The first log is about 5mb and the second managed to make it to about 25 before dying.
Oddly enough, the Node processes actually serving the sites had no problem keeping up,
it was just the proxy that kept crapping itself. I can't rule out the possibility that
it's just a bug I couldn't find though.
Almost forgot, go version running this was go1.0.3

@gopherbot
Copy link

Comment 27 by dvyukov:

Can you re-check it on tip?
There are some significant changes in scheduler and it can be fixed already.

@gopherbot
Copy link

Comment 28 by download333:

You mean just pull the latest code from the repository?

@dvyukov
Copy link
Member

dvyukov commented Apr 4, 2013

Comment 29:

yep

@snaury
Copy link
Contributor

snaury commented Apr 20, 2013

Comment 30:

Russ, back in 2011 you seemed against the idea
(https://golang.org/issue/1644), too bad I didn't know about this
bug before now. However, since a bounded thread pool may create unexpected deadlocks,
what about introducing an ever increasing delay for spawning new threads after a certain
number of threads? (one of Ms would then become a thread manager, adding more threads)
On the other hand, since 2011, I just learned that I always have to limit concurrency
where it matters, e.g. never call net.Dial with a random hostname directly, resolve the
hostname with max 16 at the same time first and use the resolved address (also limit
net.Dials altogethers before they became non-blocking and less of a problem). The issue
forces you to think about what you're doing instead of just blindly doing it, and the
result is better: if resolves are the problem that's what you have limited, it doesn't
harm the rest of the system.

@nightlyone
Copy link
Contributor

Comment 31:

Like the idea of increasing spawn delay in relation to spawn rate. This could model the
cost related to spawning threads pretty well.

@rsc
Copy link
Contributor Author

rsc commented Jul 30, 2013

Comment 32:

I think we should do this for Go 1.2. In package runtime/debug:
// SetMaxThreads sets the maximum number of operating system
// threads that the runtime will create for the current program.
// It returns the previous setting.
// The default maximum is 1000.
func SetMaxThreads(max int) int
We should also limit the number of cgo calls that package net makes.

@dvyukov
Copy link
Member

dvyukov commented Jul 30, 2013

Comment 33:

I am skeptical about this.
We do not know how many threads a program needs. So we will unnecessary limit it for
large programs running on large boxes; and at the same time allow a "hello world"
program to create 1000 threads when it needs only 1.
Users must limit number of threads in sys/cgo calls. If that's implemented poorly in net
package than that needs to be addressed (most likely we do not want 1000 threads
resolving DNS).
We can also throttle thread creation in runtime after some threshold.
But hard limit is a time bomb.

@rsc
Copy link
Contributor Author

rsc commented Jul 30, 2013

Comment 34:

Labels changed: added feature.

@rsc
Copy link
Contributor Author

rsc commented Aug 2, 2013

Comment 35:

Running the OS out of threads is just as bad.

@rsc
Copy link
Contributor Author

rsc commented Aug 9, 2013

Comment 36:

I think you are right, Dmitriy. We should not limit the number of threads. We should
crash the program if it creates too many threads. How many is too many? Something less
than what will wedge the operating system. I will see if 10000 can be done.

@dvyukov
Copy link
Member

dvyukov commented Aug 11, 2013

Comment 37:

Ideally it's still complemented by thread creation throttling, because it's not fully
user visible characteristic of the program, and it may be further complicated by
interaction of various libraries.
On my desktop I can easily create 20000+ threads, and have all them runnable at the same
time.

@rsc
Copy link
Contributor Author

rsc commented Aug 17, 2013

Comment 38:

This issue was closed by revision 665feee.

Status changed to Fixed.

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants