| Issue 10611: | StatsTableTest.MultipleThreads is flaky | |
| 3 people starred this issue and may be notified of changes. | Back to list |
Sign in to add a comment
|
Here's an example on the mac: [----------] 7 tests from StatsTableTest [ RUN ] StatsTableTest.VerifySlots [ OK ] StatsTableTest.VerifySlots (1 ms) [ RUN ] StatsTableTest.MultipleThreads /Users/maruel/code/b/slave/mac/build/src/base/stats_table_unittest.cc:151: Failure Value of: table.GetCounterValue(name) Actual: 24947 Expected: 1313 * kMaxThreads Which is: 26260 [ FAILED ] StatsTableTest.MultipleThreads (4583 ms) [ RUN ] StatsTableTest.MultipleProcesses [ OK ] StatsTableTest.MultipleProcesses (4854 ms) [ RUN ] StatsTableTest.StatsCounter [ OK ] StatsTableTest.StatsCounter (13 ms) [ RUN ] StatsTableTest.StatsCounterTimer [ OK ] StatsTableTest.StatsCounterTimer (1005 ms) [ RUN ] StatsTableTest.StatsRate [ OK ] StatsTableTest.StatsRate (1005 ms) [ RUN ] StatsTableTest.StatsScope [ OK ] StatsTableTest.StatsScope (1014 ms) [----------] 7 tests from StatsTableTest (12477 ms total)
Apr 23, 2009
This is currently failing on Linux builders too: http://build.chromium.org/buildbot/waterfall/builders/Modules%20Linux%20(dbg)/builds/72 81/steps/base_unittests/logs/stdio will turn this test off altogether shortly.
Jun 20, 2009
To reproduce: ./base_unittests --gtest_filter=StatsTableTest.MultipleThreads --gtest_break_on_failure --gtest_repeat=1000 It fails for me after 5 to 10 iterations.
Jun 20, 2009
If I set a breakpoint in StatsTable::RegisterThread, I see that multiple threads are in the critical section simultaneously. Could it be because RegisterThread uses a SharedMemoryAutoLock, which ends up using lockf? According to the man page on my (Mac OS X) machine: "Calls to lockf() from other processes which attempt to lock the locked file section will either return an error value or block until the section becomes unlocked". Is it possible lockf only works across process and not across threads?
Jun 20, 2009
Oh - that is definitely the bug. Looking at the shared_memory.h, we find this nice nugget: // Lock the shared memory. // This is a cross-process lock which may be recursively // locked by the same thread. // TODO(port): // WARNING: on POSIX the lock only works across processes, not // across threads. 2 threads in the same process can both grab the // lock at the same time. There are several solutions for this // (futex, lockf+anon_semaphore) but none are both clean and common // across Mac and Linux. void Lock(); So yeah, shared memory locking is busted across the board for Mac & Linux.
Jun 20, 2009
What about adding a Lock to SharedMemory that is used for cross-thread locking.
SharedMemory::Lock and Unlock in shared_memory_posix.cc would then look like:
void SharedMemory::Lock() {
lock.lock(); // in case another thread in this process already has the shared-memory lock
LockOrUnlockCommon(F_LOCK);
}
void SharedMemory::Unlock() {
LockOrUnlockCommon(F_ULOCK);
lock.unlock(); // let other threads in this process acquire the shared-memory lock
}
Jun 20, 2009
Re Comment @6:
Make that:
void SharedMemory::Lock() {
lock_.Acquire(); // in case another thread in this process already has the shared-memory lock
LockOrUnlockCommon(F_LOCK);
}
void SharedMemory::Unlock() {
LockOrUnlockCommon(F_ULOCK);
lock_.Release(); // let other threads in this process acquire the shared-memory lock
}
I've tested it, and it seems to fix the bug.
If that sounds OK, I'll write a unit-test for shared_memory that demonstrates the bug.
Jul 16, 2009
(No comment was entered for this change.)
Labels:
FlakyTest
Sep 23, 2009
(No comment was entered for this change.)
Labels:
Mstone-X
Dec 18, 2009
(No comment was entered for this change.)
Labels:
-Area-Infrastructure Area-BuildTools
May 7, 2010
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=46707
------------------------------------------------------------------------
r46707 | phajdan.jr@chromium.org | 2010-05-07 10:36:55 -0700 (Fri, 07 May 2010) | 9 lines
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=46707&r2=46706
Mark StatsTableTest.MultipleThreads as flaky
instead of using #if 0.
TBR=maruel
TEST=none
BUG=10611
Review URL: http://codereview.chromium.org/1990007
------------------------------------------------------------------------
May 7, 2010
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=46743
------------------------------------------------------------------------
r46743 | shess@chromium.org | 2010-05-07 15:02:59 -0700 (Fri, 07 May 2010) | 13 lines
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=46743&r2=46742
Revert StatsTableTest.MultipleThreads change which is failing tsan.
Revert r46707:
Mark StatsTableTest.MultipleThreads as flaky
instead of using #if 0.
TBR=maruel,phajdan.jr
TEST=none
BUG=10611
Review URL: http://codereview.chromium.org/2054003
------------------------------------------------------------------------
May 10, 2010
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=46784
------------------------------------------------------------------------
r46784 | phajdan.jr@chromium.org | 2010-05-08 02:30:28 -0700 (Sat, 08 May 2010) | 13 lines
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=46784&r2=46783
Disable StatsTableTest.MultipleThreads
instead of using #if 0 (so that the code
doesn't rot).
This is http://codereview.chromium.org/1990007 done right,
this time should not break tsan.
TBR=maruel
TEST=none
BUG=10611
Review URL: http://codereview.chromium.org/2022003
------------------------------------------------------------------------
May 26, 2010
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=48180
------------------------------------------------------------------------
r48180 | jhawkins@chromium.org | 2010-05-25 12:56:20 -0700 (Tue, 25 May 2010) | 5 lines
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=48180&r2=48179
TTF: Re-enable StatsTableTest.MultipleThreads by marking it flaky.
TBR=stuartmorgan
BUG=10611
TEST=StatsTableTest.MultipleThreads
------------------------------------------------------------------------
Jul 30, 2010
(No comment was entered for this change.)
Labels:
-FlakyTest Tests-Flaky
Aug 20, 2010
It's still flaky and it's quite slow, 40s for a run in the shlib slave. Something should be done to make this test < 10s. (cc'ing the last ones who committed to this file 2 years ago)
Cc:
est...@chromium.org e...@chromium.org
Aug 20, 2010
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=56859
------------------------------------------------------------------------
r56859 | maruel@chromium.org | 2010-08-20 08:44:16 -0700 (Fri, 20 Aug 2010) | 6 lines
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=56859&r2=56858
Mark StatsTableTest.MultipleProcesses as flaky (again)
BUG=10611
TEST=flaky test
Review URL: http://codereview.chromium.org/3107028
------------------------------------------------------------------------
Sep 23, 2010
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=60385
------------------------------------------------------------------------
r60385 | evan@chromium.org | Thu Sep 23 16:53:40 PDT 2010
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=60385&r2=60384&pathrev=60385
base_unittests: Turn down the sleeps on the stats table tests.
These tests are over 20% of the time taken by base_unittests.
I don't think making these sleeps shorter should affect their
reliability too much.
BUG=10611
Review URL: http://codereview.chromium.org/3474009
------------------------------------------------------------------------
Feb 14, 2011
(No comment was entered for this change.)
Labels:
-HelpWanted GoodFirstBug
Mar 18, 2011
Here's an example on the mac: [----------] 7 tests from StatsTableTest [ RUN ] StatsTableTest.VerifySlots [ OK ] StatsTableTest.VerifySlots (1 ms) [ RUN ] StatsTableTest.MultipleThreads /Users/maruel/code/b/slave/mac/build/src/base/stats_table_unittest.cc:151: Failure Value of: table.GetCounterValue(name) Actual: 24947 Expected: 1313 * kMaxThreads Which is: 26260 [ FAILED ] StatsTableTest.MultipleThreads (4583 ms) [ RUN ] StatsTableTest.MultipleProcesses [ OK ] StatsTableTest.MultipleProcesses (4854 ms) [ RUN ] StatsTableTest.StatsCounter [ OK ] StatsTableTest.StatsCounter (13 ms) [ RUN ] StatsTableTest.StatsCounterTimer [ OK ] StatsTableTest.StatsCounterTimer (1005 ms) [ RUN ] StatsTableTest.StatsRate [ OK ] StatsTableTest.StatsRate (1005 ms) [ RUN ] StatsTableTest.StatsScope [ OK ] StatsTableTest.StatsScope (1014 ms) [----------] 7 tests from StatsTableTest (12477 ms total)
Labels:
-GoodFirstBug bulkmove Hotlist-GoodFirstBug
Mar 18, 2011
Here's an example on the mac: [----------] 7 tests from StatsTableTest [ RUN ] StatsTableTest.VerifySlots [ OK ] StatsTableTest.VerifySlots (1 ms) [ RUN ] StatsTableTest.MultipleThreads /Users/maruel/code/b/slave/mac/build/src/base/stats_table_unittest.cc:151: Failure Value of: table.GetCounterValue(name) Actual: 24947 Expected: 1313 * kMaxThreads Which is: 26260 [ FAILED ] StatsTableTest.MultipleThreads (4583 ms) [ RUN ] StatsTableTest.MultipleProcesses [ OK ] StatsTableTest.MultipleProcesses (4854 ms) [ RUN ] StatsTableTest.StatsCounter [ OK ] StatsTableTest.StatsCounter (13 ms) [ RUN ] StatsTableTest.StatsCounterTimer [ OK ] StatsTableTest.StatsCounterTimer (1005 ms) [ RUN ] StatsTableTest.StatsRate [ OK ] StatsTableTest.StatsRate (1005 ms) [ RUN ] StatsTableTest.StatsScope [ OK ] StatsTableTest.StatsScope (1014 ms) [----------] 7 tests from StatsTableTest (12477 ms total)
Labels:
-Area-BuildTools Area-Build
Nov 8, 2011
(No comment was entered for this change.)
Labels:
Hotlist-Fixit
Feb 13, 2012
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=121782
------------------------------------------------------------------------
r121782 | evan@chromium.org | Mon Feb 13 16:29:14 PST 2012
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=121782&r2=121781&pathrev=121782
M http://src.chromium.org/viewvc/chrome/trunk/src/base/process_util_unittest.cc?r1=121782&r2=121781&pathrev=121782
M http://src.chromium.org/viewvc/chrome/trunk/src/base/file_util_unittest.cc?r1=121782&r2=121781&pathrev=121782
M http://src.chromium.org/viewvc/chrome/trunk/src/base/timer_unittest.cc?r1=121782&r2=121781&pathrev=121782
M http://src.chromium.org/viewvc/chrome/trunk/src/base/platform_file_unittest.cc?r1=121782&r2=121781&pathrev=121782
M http://src.chromium.org/viewvc/chrome/trunk/src/base/shared_memory_unittest.cc?r1=121782&r2=121781&pathrev=121782
M http://src.chromium.org/viewvc/chrome/trunk/src/base/files/file_path_watcher_browsertest.cc?r1=121782&r2=121781&pathrev=121782
Flakiness cleanup: disable flaky tests under base/
See https://groups.google.com/a/chromium.org/group/chromium-dev/browse_thread/thread/fcec09fc659f39a6
BUG=46246,85930,10611,86494,95058,61589,25038
Review URL: https://chromiumcodereview.appspot.com/9386014
------------------------------------------------------------------------
Feb 16, 2012
jar@: could you help routing this bug to the right owner? svn blame shows mostly initial.commit. Thanks.
Cc:
j...@chromium.org
Feb 16, 2012
jar: I've been looking at this test. It seems that it has uncovered a bug with the stated interface of SharedMemory::Lock. i.e. on posix, it doesn't work across threads in the same process. I'm wondering if that interface is actually needed though? The suggestion in comment 6 will paper over the problem the test is exposing without actually fixing the core issues since each there could be multiple SharedMemory objects used by different threads. If per thread locking isn't actually needed, then I propose changing the comments for SharedMem and StatsTable to reflect that and changing the test to use processes. If per thread locking is needed, then I think we'll need to do something like make the shared memory region a bit bigger to store the id of a shared semaphore or similar approach. Any thoughts?
Owner:
vandebo@chromium.org
Labels: -Pri-3 -Hotlist-GoodFirstBug Pri-2
Feb 16, 2012
The suggestion in comment 7 LGTM, and seems like a central fix for a rarely used(?) facility. Perhaps you'd like to land it? One hassle is that the "global lock, internal to the process" needs to have a long lifetime, and hence should probably be a leaky singleton.
Feb 16, 2012
Do we actually use SharedMemory::Lock anywhere? It might be easier to just delete all this code.
Feb 16, 2012
@jar I had comment 6/7 coded up with a class local lock, but I think it's possible for multiple SharedMemory classes to be pointing at the shared memory and thus a class local lock wouldn't work in those cases. Are you suggesting a global lock? @evan - I changed the signature and got several compile failures, so it is used. I didn't look closely at the call sites.
Feb 16, 2012
@jar I was hoping that you knew more about the usage pattern of these classes. Sounds like maybe you don't. I'll spend more time looking into it.
Feb 16, 2012
I think (as per title) it was used in Stats Tables. Unless we've trashed them that would be the consumer of this lock. I know we've disabled them (to preclude accidental security issue)... but think they are available via command line switches for deveoper use.
Feb 16, 2012
The usage pattern (via Stats Tables) is that these locks are used to allocate counters across multiple processes and threads. The lock is only used during allocation, and not during increments of the counters. As a result, a global lock (yeah, that was what I was suggesting) should not IMO be a performance bottleneck. I would expect that IF there was any contention, it would be a bid deal across processes, and the additional process-internal contention would be far less significant (but I'm just guessing).
Feb 24, 2012
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=123647
------------------------------------------------------------------------
r123647 | vandebo@chromium.org | Fri Feb 24 22:28:56 PST 2012
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=123647&r2=123646&pathrev=123647
M http://src.chromium.org/viewvc/chrome/trunk/src/base/shared_memory_posix.cc?r1=123647&r2=123646&pathrev=123647
M http://src.chromium.org/viewvc/chrome/trunk/src/base/shared_memory.h?r1=123647&r2=123646&pathrev=123647
Improve SharedMemory::Lock on Posix and reenable StatsTableTest.MultipleThreads
BUG=10611
TEST=NONE
Review URL: http://codereview.chromium.org/9463018
------------------------------------------------------------------------
Feb 25, 2012
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=123672
------------------------------------------------------------------------
r123672 | vandebo@chromium.org | Sat Feb 25 14:28:22 PST 2012
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=123672&r2=123671&pathrev=123672
Disable StatsTableTest.MultipleThreads on Mac.
It's failing on the Mac Valgrind bot
TBR=thestig@chromium.org
BUG=10611
TEST=NONE
Review URL: http://codereview.chromium.org/9466039
------------------------------------------------------------------------
Mar 6, 2012
Available + Owner == Default Assgined
Status:
Assigned
Jun 11, 2012
(Un-ccing myself from bugs.)
Cc:
-evan@chromium.org
Mar 9, 2013
(No comment was entered for this change.)
Labels:
-Tests -Tests-Flaky -Area-Build Cr-Tests-Flaky Build Cr-Tests
Mar 29, 2013
Hi there. I remember hitting this issue every time we deploy a new memory tool, and now here it is again with ThreadSanitizer v2. According to http://build.chromium.org/p/chromium.fyi/builders/Chromium%20Linux%20TSan%20v2/builds/4935/steps/base_unittests/logs/stdio, this test is consistently failing under TSan v2: ===================================== ../../base/metrics/stats_table_unittest.cc:157: Failure Value of: table.GetCounterValue(name) Actual: 21008 Expected: 1313 * kMaxThreads Which is: 26260 ===================================== My guess is that every tool that slows down the program execution causes the problems with GetCounterValue being off by 1313 * several threads. There certainly is a data race, since the counter values are incorrect. But it's unclear to me whether the problem is in the test itself or it's about incorrect synchronization in the StatsTable.
Owner:
j...@chromium.org
Cc: -j...@chromium.org vandebo@chromium.org erg@chromium.org dvyu...@chromium.org Labels: Performance-ThreadSanitizer
Apr 1, 2013
(No comment was entered for this change.)
Labels:
-Performance-ThreadSanitizer Stability-ThreadSanitizer
Apr 1, 2013
------------------------------------------------------------------------ r191613 | glider@chromium.org | 2013-04-01T15:48:17.274480Z Changed paths: M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=191613&r2=191612&pathrev=191613 Disable StatsTableTest.MultipleThreads under ThreadSanitizer v2 BUG=10611 Review URL: https://codereview.chromium.org/13311002 ------------------------------------------------------------------------ |
||||||||
| ► Sign in to add a comment | |||||||||
The following revision refers to this bug: http://src.chromium.org/viewvc/chrome?view=rev&revision=14029 ------------------------------------------------------------------------ r14029 | maruel@chromium.org | 2009-04-20 08:28:38 -0700 (Mon, 20 Apr 2009) | 4 lines Changed paths: M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=14029&r2=14028 Disable StatsTableTest.MultipleThreads since it is flaky under heavy load. BUG=10611 Review URL: http://codereview.chromium.org/79010 ------------------------------------------------------------------------