My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 10611: StatsTableTest.MultipleThreads is flaky
3 people starred this issue and may be notified of changes. Back to list
 
Project Member Reported by maruel@chromium.org, Apr 16, 2009
Here's an example on the mac:

[----------] 7 tests from StatsTableTest
[ RUN      ] StatsTableTest.VerifySlots
[       OK ] StatsTableTest.VerifySlots (1 ms)
[ RUN      ] StatsTableTest.MultipleThreads
/Users/maruel/code/b/slave/mac/build/src/base/stats_table_unittest.cc:151: 
Failure
Value of: table.GetCounterValue(name)
  Actual: 24947
Expected: 1313 * kMaxThreads
Which is: 26260
[  FAILED  ] StatsTableTest.MultipleThreads (4583 ms)
[ RUN      ] StatsTableTest.MultipleProcesses
[       OK ] StatsTableTest.MultipleProcesses (4854 ms)
[ RUN      ] StatsTableTest.StatsCounter
[       OK ] StatsTableTest.StatsCounter (13 ms)
[ RUN      ] StatsTableTest.StatsCounterTimer
[       OK ] StatsTableTest.StatsCounterTimer (1005 ms)
[ RUN      ] StatsTableTest.StatsRate
[       OK ] StatsTableTest.StatsRate (1005 ms)
[ RUN      ] StatsTableTest.StatsScope
[       OK ] StatsTableTest.StatsScope (1014 ms)
[----------] 7 tests from StatsTableTest (12477 ms total)

Apr 20, 2009
#1 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=14029 

------------------------------------------------------------------------
r14029 | maruel@chromium.org | 2009-04-20 08:28:38 -0700 (Mon, 20 Apr 2009) | 4 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=14029&r2=14028

Disable StatsTableTest.MultipleThreads since it is flaky under heavy load.

BUG=10611
Review URL: http://codereview.chromium.org/79010
------------------------------------------------------------------------

Apr 23, 2009
#2 robertsh...@chromium.org
This is currently failing on Linux builders too:

http://build.chromium.org/buildbot/waterfall/builders/Modules%20Linux%20(dbg)/builds/72
81/steps/base_unittests/logs/stdio

will turn this test off altogether shortly.
Jun 20, 2009
#3 rhodes.n...@gmail.com
To reproduce:
./base_unittests --gtest_filter=StatsTableTest.MultipleThreads --gtest_break_on_failure --gtest_repeat=1000

It fails for me after 5 to 10 iterations.
Jun 20, 2009
#4 rhodes.n...@gmail.com
If I set a breakpoint in StatsTable::RegisterThread, I see that multiple threads are in the critical section 
simultaneously.

Could it be because RegisterThread uses a SharedMemoryAutoLock, which ends up using lockf?  According to 
the man page on my (Mac OS X) machine: "Calls to lockf() from other processes which attempt to lock the locked 
file section will either return an error value or block until the section becomes unlocked". Is it possible lockf only 
works across process and not across threads?
Jun 20, 2009
#5 mbel...@chromium.org
Oh - that is definitely the bug.

Looking at the shared_memory.h, we find this nice nugget:

  // Lock the shared memory.
  // This is a cross-process lock which may be recursively
  // locked by the same thread.
  // TODO(port):
  // WARNING: on POSIX the lock only works across processes, not
  // across threads.  2 threads in the same process can both grab the
  // lock at the same time.  There are several solutions for this
  // (futex, lockf+anon_semaphore) but none are both clean and common
  // across Mac and Linux.
  void Lock();


So yeah, shared memory locking is busted across the board for Mac & Linux.
Jun 20, 2009
#6 rhodes.n...@gmail.com
What about adding a Lock to SharedMemory that is used for cross-thread locking.

SharedMemory::Lock and Unlock in shared_memory_posix.cc would then look like:
void SharedMemory::Lock() {
  lock.lock(); // in case another thread in this process already  has the shared-memory lock
  LockOrUnlockCommon(F_LOCK);
}

void SharedMemory::Unlock() {
  LockOrUnlockCommon(F_ULOCK);
  lock.unlock();  // let other threads in this process acquire the shared-memory lock
}

Jun 20, 2009
#7 rhodes.n...@gmail.com
Re Comment @6:

Make that:
void SharedMemory::Lock() {
  lock_.Acquire(); // in case another thread in this process already  has the shared-memory lock
  LockOrUnlockCommon(F_LOCK);
}

void SharedMemory::Unlock() {
  LockOrUnlockCommon(F_ULOCK);
  lock_.Release();  // let other threads in this process acquire the shared-memory lock
}

I've tested it, and it seems to fix the bug.

If that sounds OK, I'll write a unit-test for shared_memory that demonstrates the bug.

Jul 16, 2009
#8 phajdan.jr@chromium.org
(No comment was entered for this change.)
Labels: FlakyTest
Sep 23, 2009
#9 mal.chromium@gmail.com
(No comment was entered for this change.)
Labels: Mstone-X
Dec 18, 2009
#10 mal.chromium@gmail.com
(No comment was entered for this change.)
Labels: -Area-Infrastructure Area-BuildTools
May 7, 2010
#11 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46707 

------------------------------------------------------------------------
r46707 | phajdan.jr@chromium.org | 2010-05-07 10:36:55 -0700 (Fri, 07 May 2010) | 9 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=46707&r2=46706

Mark StatsTableTest.MultipleThreads as flaky
instead of using #if 0.

TBR=maruel

TEST=none
BUG=10611

Review URL: http://codereview.chromium.org/1990007
------------------------------------------------------------------------

May 7, 2010
#12 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46743 

------------------------------------------------------------------------
r46743 | shess@chromium.org | 2010-05-07 15:02:59 -0700 (Fri, 07 May 2010) | 13 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=46743&r2=46742

Revert StatsTableTest.MultipleThreads change which is failing tsan.

Revert r46707:

Mark StatsTableTest.MultipleThreads as flaky
instead of using #if 0.

TBR=maruel,phajdan.jr

TEST=none
BUG=10611

Review URL: http://codereview.chromium.org/2054003
------------------------------------------------------------------------

May 10, 2010
#13 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=46784 

------------------------------------------------------------------------
r46784 | phajdan.jr@chromium.org | 2010-05-08 02:30:28 -0700 (Sat, 08 May 2010) | 13 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=46784&r2=46783

Disable StatsTableTest.MultipleThreads
instead of using #if 0 (so that the code
doesn't rot).

This is http://codereview.chromium.org/1990007 done right,
this time should not break tsan.

TBR=maruel

TEST=none
BUG=10611

Review URL: http://codereview.chromium.org/2022003
------------------------------------------------------------------------

May 26, 2010
#14 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=48180 

------------------------------------------------------------------------
r48180 | jhawkins@chromium.org | 2010-05-25 12:56:20 -0700 (Tue, 25 May 2010) | 5 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=48180&r2=48179

TTF: Re-enable StatsTableTest.MultipleThreads by marking it flaky.

TBR=stuartmorgan
BUG=10611
TEST=StatsTableTest.MultipleThreads
------------------------------------------------------------------------

Jul 30, 2010
#15 mal@google.com
(No comment was entered for this change.)
Labels: -FlakyTest Tests-Flaky
Aug 20, 2010
#16 maruel@chromium.org
It's still flaky and it's quite slow, 40s for a run in the shlib slave. Something should be done to make this test < 10s.

(cc'ing the last ones who committed to this file 2 years ago)
Cc: est...@chromium.org e...@chromium.org
Aug 20, 2010
#17 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=56859 

------------------------------------------------------------------------
r56859 | maruel@chromium.org | 2010-08-20 08:44:16 -0700 (Fri, 20 Aug 2010) | 6 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=56859&r2=56858

Mark StatsTableTest.MultipleProcesses as flaky (again)

BUG=10611
TEST=flaky test

Review URL: http://codereview.chromium.org/3107028
------------------------------------------------------------------------

Sep 23, 2010
#18 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=60385

------------------------------------------------------------------------
r60385 | evan@chromium.org | Thu Sep 23 16:53:40 PDT 2010

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/stats_table_unittest.cc?r1=60385&r2=60384&pathrev=60385

base_unittests: Turn down the sleeps on the stats table tests.

These tests are over 20% of the time taken by base_unittests.
I don't think making these sleeps shorter should affect their
reliability too much.

BUG=10611

Review URL: http://codereview.chromium.org/3474009
------------------------------------------------------------------------
Feb 14, 2011
#19 mal@google.com
(No comment was entered for this change.)
Labels: -HelpWanted GoodFirstBug
Mar 18, 2011
#20 lafo...@chromium.org
Here's an example on the mac:

[----------] 7 tests from StatsTableTest
[ RUN      ] StatsTableTest.VerifySlots
[       OK ] StatsTableTest.VerifySlots (1 ms)
[ RUN      ] StatsTableTest.MultipleThreads
/Users/maruel/code/b/slave/mac/build/src/base/stats_table_unittest.cc:151: 
Failure
Value of: table.GetCounterValue(name)
  Actual: 24947
Expected: 1313 * kMaxThreads
Which is: 26260
[  FAILED  ] StatsTableTest.MultipleThreads (4583 ms)
[ RUN      ] StatsTableTest.MultipleProcesses
[       OK ] StatsTableTest.MultipleProcesses (4854 ms)
[ RUN      ] StatsTableTest.StatsCounter
[       OK ] StatsTableTest.StatsCounter (13 ms)
[ RUN      ] StatsTableTest.StatsCounterTimer
[       OK ] StatsTableTest.StatsCounterTimer (1005 ms)
[ RUN      ] StatsTableTest.StatsRate
[       OK ] StatsTableTest.StatsRate (1005 ms)
[ RUN      ] StatsTableTest.StatsScope
[       OK ] StatsTableTest.StatsScope (1014 ms)
[----------] 7 tests from StatsTableTest (12477 ms total)
Labels: -GoodFirstBug bulkmove Hotlist-GoodFirstBug
Mar 18, 2011
#21 lafo...@chromium.org
Here's an example on the mac:

[----------] 7 tests from StatsTableTest
[ RUN      ] StatsTableTest.VerifySlots
[       OK ] StatsTableTest.VerifySlots (1 ms)
[ RUN      ] StatsTableTest.MultipleThreads
/Users/maruel/code/b/slave/mac/build/src/base/stats_table_unittest.cc:151: 
Failure
Value of: table.GetCounterValue(name)
  Actual: 24947
Expected: 1313 * kMaxThreads
Which is: 26260
[  FAILED  ] StatsTableTest.MultipleThreads (4583 ms)
[ RUN      ] StatsTableTest.MultipleProcesses
[       OK ] StatsTableTest.MultipleProcesses (4854 ms)
[ RUN      ] StatsTableTest.StatsCounter
[       OK ] StatsTableTest.StatsCounter (13 ms)
[ RUN      ] StatsTableTest.StatsCounterTimer
[       OK ] StatsTableTest.StatsCounterTimer (1005 ms)
[ RUN      ] StatsTableTest.StatsRate
[       OK ] StatsTableTest.StatsRate (1005 ms)
[ RUN      ] StatsTableTest.StatsScope
[       OK ] StatsTableTest.StatsScope (1014 ms)
[----------] 7 tests from StatsTableTest (12477 ms total)
Labels: -Area-BuildTools Area-Build
Nov 8, 2011
#22 dhar...@google.com
(No comment was entered for this change.)
Labels: Hotlist-Fixit
Feb 16, 2012
#24 kmadh...@chromium.org
jar@: could you help routing this bug to the right owner? svn blame shows mostly initial.commit.

Thanks.
Cc: j...@chromium.org
Feb 16, 2012
#25 vandebo@chromium.org
jar:  I've been looking at this test.  It seems that it has uncovered a bug with the stated interface of SharedMemory::Lock.  i.e. on posix, it doesn't work across threads in the same process.  I'm wondering if that interface is actually needed though? The suggestion in comment 6 will paper over the problem the test is exposing without actually fixing the core issues since each there could be multiple SharedMemory objects used by different threads.

If per thread locking isn't actually needed, then I propose changing the comments for SharedMem and StatsTable to reflect that and changing the test to use processes. If per thread locking is needed, then I think we'll need to do something like make the shared memory region a bit bigger to store the id of a shared semaphore or similar approach. 

Any thoughts?
Owner: vandebo@chromium.org
Labels: -Pri-3 -Hotlist-GoodFirstBug Pri-2
Feb 16, 2012
#26 j...@chromium.org
The suggestion in comment 7 LGTM, and seems like a central fix for a rarely used(?) facility.  Perhaps you'd like to land it?  One hassle is that the "global lock, internal to the process" needs to have a long lifetime, and hence should probably be a leaky singleton.
Feb 16, 2012
#27 evan@chromium.org
Do we actually use SharedMemory::Lock anywhere?  It might be easier to just delete all this code.
Feb 16, 2012
#28 vandebo@chromium.org
@jar I had comment 6/7 coded up with a class local lock, but I think it's possible for multiple SharedMemory classes to be pointing at the shared memory and thus a class local lock wouldn't work in those cases.  Are you suggesting a global lock?

@evan - I changed the signature and got several compile failures, so it is used.  I didn't look closely at the call sites.
Feb 16, 2012
#29 vandebo@chromium.org
@jar  I was hoping that you knew more about the usage pattern of these classes.  Sounds like maybe you don't.  I'll spend more time looking into it.
Feb 16, 2012
#30 j...@chromium.org
I think (as per title) it was used in Stats Tables.  Unless we've trashed them that would be the consumer of this lock. I know we've disabled them (to preclude accidental security issue)... but think they are available via command line switches for deveoper use.
Feb 16, 2012
#31 j...@chromium.org
The usage pattern (via Stats Tables) is that these locks are used to allocate counters across multiple processes and threads.  The lock is only used during allocation, and not during increments of the counters.  

As a result, a global lock (yeah, that was what I was suggesting) should not IMO be a performance bottleneck.  I would expect that IF there was any contention, it would be a bid deal across processes, and the additional process-internal contention would be far less significant (but I'm just guessing).
Feb 24, 2012
#32 bugdroid1@chromium.org
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=123647

------------------------------------------------------------------------
r123647 | vandebo@chromium.org | Fri Feb 24 22:28:56 PST 2012

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=123647&r2=123646&pathrev=123647
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/shared_memory_posix.cc?r1=123647&r2=123646&pathrev=123647
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/shared_memory.h?r1=123647&r2=123646&pathrev=123647

Improve SharedMemory::Lock on Posix and reenable StatsTableTest.MultipleThreads

BUG=10611
TEST=NONE


Review URL: http://codereview.chromium.org/9463018
------------------------------------------------------------------------
Feb 25, 2012
#33 bugdroid1@chromium.org
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=123672

------------------------------------------------------------------------
r123672 | vandebo@chromium.org | Sat Feb 25 14:28:22 PST 2012

Changed paths:
 M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=123672&r2=123671&pathrev=123672

Disable StatsTableTest.MultipleThreads on Mac.

It's failing on the Mac Valgrind bot

TBR=thestig@chromium.org
BUG=10611
TEST=NONE


Review URL: http://codereview.chromium.org/9466039
------------------------------------------------------------------------
Mar 6, 2012
#34 lafo...@google.com
Available + Owner == Default Assgined
Status: Assigned
Jun 11, 2012
#35 evan@chromium.org
(Un-ccing myself from bugs.)
Cc: -evan@chromium.org
Mar 9, 2013
#36 bugdroid1@chromium.org
(No comment was entered for this change.)
Labels: -Tests -Tests-Flaky -Area-Build Cr-Tests-Flaky Build Cr-Tests
Mar 29, 2013
#37 gli...@chromium.org
Hi there.
I remember hitting this issue every time we deploy a new memory tool, and now here it is again with ThreadSanitizer v2.
According to http://build.chromium.org/p/chromium.fyi/builders/Chromium%20Linux%20TSan%20v2/builds/4935/steps/base_unittests/logs/stdio, this test is consistently failing under TSan v2:
=====================================
../../base/metrics/stats_table_unittest.cc:157: Failure
Value of: table.GetCounterValue(name)
  Actual: 21008
Expected: 1313 * kMaxThreads
Which is: 26260
=====================================

My guess is that every tool that slows down the program execution causes the problems with GetCounterValue being off by 1313 * several threads.
There certainly is a data race, since the counter values are incorrect. But it's unclear to me whether the problem is in the test itself or it's about incorrect synchronization in the StatsTable.
Owner: j...@chromium.org
Cc: -j...@chromium.org vandebo@chromium.org erg@chromium.org dvyu...@chromium.org
Labels: Performance-ThreadSanitizer
Apr 1, 2013
#38 bugdroid1@chromium.org
(No comment was entered for this change.)
Labels: -Performance-ThreadSanitizer Stability-ThreadSanitizer
Apr 1, 2013
#39 bugdroid1@chromium.org
------------------------------------------------------------------------
r191613 | glider@chromium.org | 2013-04-01T15:48:17.274480Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/base/metrics/stats_table_unittest.cc?r1=191613&r2=191612&pathrev=191613

Disable  StatsTableTest.MultipleThreads under ThreadSanitizer v2

BUG=10611
Review URL: https://codereview.chromium.org/13311002
------------------------------------------------------------------------
Sign in to add a comment

Powered by Google Project Hosting