Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost wakeups in spinlock degrade tcmalloc performance substantially #494

Closed
alk opened this issue Aug 23, 2015 · 2 comments
Closed

Lost wakeups in spinlock degrade tcmalloc performance substantially #494

alk opened this issue Aug 23, 2015 · 2 comments

Comments

@alk
Copy link
Contributor

alk commented Aug 23, 2015

Originally reported on Google Code with ID 491

In uncontended cases, SpinLock::{Lock,Unlock} have an overhead of 13ns. In cases where
the lock is contended, SpinLock::Unlock has a bug where it can fail to wakeup waiters,
causing the overhead of SpinLock::{Lock,Unlock} to blow up from 13ns to as high as
256ms (a blow up by a factor of over a million!). In cases of heavy lock contention,
this can cause applications to sleep for minutes at a time without doing anything,
appearing to hang. We observe regular hangs in practice in Chrome in http://crbug.com/168139

The following fix slows down SpinLock::Unlock by 2ns in the uncontended case, but speeds
up SpinLock::Lock by up to almost 256ms in the contended case where a wakeup ix mixed.
If tcmalloc is used many times in an application in a multithreaded context (e.g. allocate
in one thread, deallocate in another), this 256ms delay can add up quickly to several
minutes of delay, causing web applications to time out and causing user-facing applications
like Chrome to appear to hang.

At Google, SpinLock::{Lock,Unlock} are used for many things besides tcmalloc, so a
2ns slowdown may be unacceptable for some of those applications; however, for tcmalloc
itself, I think it would be more than worth it to allow a 2ns slowdown in order to
avoid an unpredictable 256ms penalty for multithreaded applications. Therefore I'm
proposing that we roll the attached patch into open source tcmalloc itself in order
to improve its performance.

This performance improvement will also be rolled into Chrome so as to improve Chrome's
performance and avoid unnecessary delays.

This CL depends on http://code.google.com/p/gperftools/issues/detail?id=490 which adds
support for {Acquire,Release}_AtomicExchange.

Passes make and make check.

Reported by davidjames@google.com on 2013-01-08 18:56:39


- _Attachment: [spinlock-race-fix.txt](https://storage.googleapis.com/google-code-attachments/gperftools/issue-491/comment-0/spinlock-race-fix.txt)_
@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

As Aliaksey Kandratsenka pointed out, any reason to cast Release_AtomicExchange to uint64?

Reported by zatrazz on 2013-03-10 12:57:23

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

r195 | chappedm@gmail.com | 2013-03-10 16:02:46 -0400 (Sun, 10 Mar 2013) | 7 lines

issue-491: Significant performance improvement for spin lock contention

This patch fixes issues where spinlocks under contention were failing to
wakeup waiters, sometimes resulting in blow ups from 13ns to as high as 256ms.
Under heavy contention, applications were observed sleeping for minutes at a
time giving the appearance of a hang.

Reported by chappedm on 2013-03-10 20:03:21

  • Status changed: Fixed

@alk alk closed this as completed Aug 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant