My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 25915: Data races in BookmarkModelWorker
1 person starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  tim@chromium.org
Closed:  Dec 2009
Cc:  tim@chromium.org, k...@google.com

Restricted
  • Only users with Commit permission may comment.


Sign in to add a comment
 
Project Member Reported by timurrrr@chromium.org, Oct 27, 2009
There are at least two data races in this class: 
on "syncapi_has_shutdown_" and on "state_".
Both data races were found by ThreadSanitizer, the report is attached.

(all the source code below is taken from 
chrome/browser/sync/glue/bookmark_model_worker.cc)

Regarding the date race on "syncapi_has_shutdown_":
 49 void BookmarkModelWorker::OnSyncerShutdownComplete() {
 ...
 57   syncapi_has_shutdown_ = true;
 58   syncapi_event_.Signal();
 59 }

races with
 61 void BookmarkModelWorker::Stop() {
 ... 
 69   // Drain any final task manually until the SyncerThread tells us it 
has
 70   // totally finished. Note we use a 'while' loop and not 'if'. The 
main subtle
 71   // reason for this is that syncapi_event could be signaled the 
first time we
 72   // come through due to an old CallDoWork call, and we need to keep 
looping
 73   // until the SyncerThread either calls it again or tells us it is 
done. There
 74   // should only ever be 0 or 1 tasks Run() here, however.
 75   while (!syncapi_has_shutdown_) {
 76     {
 77       AutoLock lock(pending_work_lock_);
 78       if (pending_work_)
 79         pending_work_->Run();
 80     }
 81     syncapi_event_.Wait();  // Signaled either by new task, or 
SyncerThread
 82                             // termination.
 83   }
 ...
 86 }

both accesses are not guarded by locks and no compiler barriers are used.
Similar data race is described at 
https://code.google.com/p/data-race-test/wiki/PopularDataRaces

As for the data race on "state_", it is exactly the one described at 
https://code.google.com/p/data-race-test/wiki/PopularDataRaces :

 14 void BookmarkModelWorker::CallDoWorkFromModelSafeThreadAndWait(
 15     ModelSafeWorkerInterface::Visitor* visitor) {
 16   // It is possible this gets called when we are in the STOPPING 
state, because
 17   // the UI loop has initiated shutdown but the syncer hasn't got the 
memo yet.
 18   // This is fine, the work will get scheduled and run normally or 
run by our
 19   // code handling this case in Stop().
 20   DCHECK_NE(state_, STOPPED);
 21   if (state_ == STOPPED)
 22     return;
 ...

races with

 61 void BookmarkModelWorker::Stop() {
 62   DCHECK_EQ(MessageLoop::current(), bookmark_model_loop_);
 63   DCHECK_EQ(state_, WORKING);
 64 
 65   // We're on our own now, the beloved UI MessageLoop is no longer 
running.
 66   // Any tasks scheduled or to be scheduled on the UI MessageLoop 
will not run.
 67   state_ = RUNNING_MANUAL_SHUTDOWN_PUMP;
 ...
 75   while (!syncapi_has_shutdown_) {
 ...
 83   }
 84 
 85   state_ = STOPPED;
 86 }

BookmarkModelWorker reports.txt
9.8 KB   View   Download
Oct 27, 2009
#1 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=30180 

------------------------------------------------------------------------
r30180 | timurrrr@chromium.org | 2009-10-27 03:48:48 -0700 (Tue, 27 Oct 2009) | 4 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/tools/valgrind/tsan/suppressions.txt?r1=30180&r2=30179

Suppressed a few data races in BookmarkModelWorker
BUG=25915
TBR=dank
Review URL: http://codereview.chromium.org/340001
------------------------------------------------------------------------

Oct 27, 2009
#2 tim@chromium.org
There is no actual race with these variables because of the way the control flow is constructed, though it is 
true there is no lock.

For syncapi_has_shutdown_, this changes once and only once and from only one thread, and is read by only 
one thread *until* it changes to false, which must happen eventually.

For state_, there is no race, and the reason is maybe even more subtle.  We only set state_ to STOPPED once 
the syncer thread has stopped completely, which requires all pending tasks to be completed, and so there is 
no way we can end up at the read in CallDoWorkFromModelSafeThreadAndWait; hence the DCHECK.  There are 
lots of comments in this class already but I will try to add some to make this more clear.

Do we just leave the suppressions in this case?
Oct 27, 2009
#3 timurrrr@chromium.org
Consider the following scenario:
1) OnSyncerShutdownComplete() and Stop() are called simultaneously by threads T1 and T2
-> 1.1 T1 executes "syncapi_has_shutdown_ = true;" and yields the CPU
   1.2 T2 executes "while (!syncapi_has_shutdown_) {" and skips the while loop
This way the "synchronization" was done using a boolean without any real synchronization (locks, acquire/release barriers, 
etc). This is the very data race described at https://code.google.com/p/data-race-test/wiki/PopularDataRaces.
God knows what assembly will the compiler generate at high optimization levels for this code!

By the way, provided "this changes once and only once and from only one thread, and is read by only 
one thread *until* it changes to false" you may use base/atomic_flag.h instead of bool.
It won't require signalling a waitable event.

2) I'm not very familiar with the code of BookmarkModelWorker, but such code makes me scary:

  // It is possible this gets called when we are in the STOPPING state, because
  // the UI loop has initiated shutdown but the syncer hasn't got the memo yet.
  // This is fine, the work will get scheduled and run normally or run by our
  // code handling this case in Stop().
  DCHECK_NE(state_, STOPPED);
  if (state_ == STOPPED)
    return;

This code means this could only be called before "state_ == STOPPED" in DEBUG build but you also accept the other case...
I don't see any DEBUG-specific code other that DCHECK's in this class. For me it means either the code is broken in DEBUG 
mode (crashes when it shouldn't) or it is broken in NDEBUG build (not crashing when we are in a wrong state) or both.

If the reason is subtle, I'd rather fix the subtleness (which is very likely to be a real bug, which is hard to find!) 
instead of writing comments explaining it. Even if the code is correct now, it can be easily broken by someone else who 
won't understanding this subtleness.
Cc: k...@google.com
Oct 27, 2009
#4 tim@chromium.org
For 2), I understand your frustration with how it is written. What this was intended 
to mean is if we are in STOPPED it is likely programmer error, and if we know that we 
are, just early-out.  Proceeding won't cause the sky to fall, though; the task will 
just get posted and never run.  We could just remove the 3 lines you mention and 
things should be fine.

Your point about the subtleness was considered during the code review for this, but 
so was introducing unnecessary synchronization primitives for purposes of 
readability.

As for 1), you have instilled some doubt in my mind :) The specific case you draw out 
shouldn't cause any legitimate badness. I think the problem is that syncapi_event_ is 
not a manual reset event. The example you referenced online mentions the moving of 
do_thread2_cleanup() above done = true as a compiler optimization.  If it is possible 
(given the use of syncapi_event_) that syncapi_event_.Signal() be moved above 
syncapi_has_shutdown_ = true, and we were already waiting in Stop(), then we could 
get signaled and read a 'false' value and wait again forever.  This would be 
addressed by making the event a manual reset, because the Wait() would return 
immediately.

Thank you for bringing this up.  I think the case where OnSyncerShutdownComplete and 
Stop occur concurrently is quite rare because it is racing with code on Thread2 doing 
several bouts of synchronous IO during Shutdown(), which most of the time will be 
slower than a few consecutive instructions. But we need to make sure it is correct.  
I will look into it today or tomorrow.

Also, this class exists because we don't have a good solution yet for  issue 19757 ; 
it's probably best to focus most efforts on fixing that bug over improving the 
readability of this class for future authors, because hopefully the future work will 
be removing BookmarkModelWorker.

Status: Assigned
Owner: t...@chromium.org
Labels: -Area-Misc Feature-Sync
Oct 27, 2009
#5 timurrrr@chromium.org
syncapi_event_.Signal() can't be moved above "syncapi_has_shutdown_ = true" since 
Signal() implies a release-barrier (it does system call).

However, an opposite thing can happen: some code before "syncapi_has_shutdown_ = 
true" can be moved after it by the optimizer. If OnSyncerShutdownComplete() happens 
to be inlined, this may harmful.
I think manual reset doesn't make much difference in the scenario I've mentioned in 
my previous comment if the above optimization happens.

Regarding "state_", I'll take a closer look tomorrow.

> Also, this class exists...
If BookmarkModelWorker is not used in the Chromium itself, that's fine. Otherwise, 
I'd rather be sure the synchronization is correct.
For example, we've known about http://crbug.com/18488 in early August. But nobody 
cared much until we proved it was the cause of a top crasher...
Oct 27, 2009
#6 tim@chromium.org
My intuition (but I didn't know for sure until you confirmed it) was that the 
Signal() after setting the var to true would enforce the order to go as it appears in 
code, because it wouldn't make much sense as a waitable event otherwise.  In this 
case I remain confident that an additional lock is not needed for protecting 
syncapi_has_shutdown_ as is, because there isn't any other code in 
OnSyncerShutdownComplete to consider from a functional perspective (admittedly, this 
is only guaranteed true today and maybe not tomorrow).  

But, if the DCHECK in OnSyncerShutdownComplete were moved below the '= true', it 
could result in a false assert in debug, because of case '1)' you mention in 
Comment#3. state_ is really only used from within DCHECKs, and this was intended to 
try and make the code more clear but maybe it's doing the opposite. Perhaps it should 
be removed. I don't think the benefit from having it (if any) warrants adding in a 
lock, if the thing is only used for readability in the first place.
Dec 17, 2009
#8 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=34558 

------------------------------------------------------------------------
r34558 | tim@chromium.org | 2009-12-15 08:46:17 -0800 (Tue, 15 Dec 2009) | 12 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/sync/glue/bookmark_model_worker.cc?r1=34558&r2=34557
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/sync/glue/bookmark_model_worker.h?r1=34558&r2=34557

Use a ConditionVariable in place of WaitableEvent in BookmarkModelWorker.

This is to ensure future changes to the code don't introduce harmful races
that could arise in presence of compiler optimizations due to the lack of
a barrier permitting instruction re-ordering.  The theoretical race that
exists now isn't causing any real malfunction today, but is causing
ThreadSanitizer warnings.

BUG=25915
TEST=BookmarkModelWorkerTest

Review URL: http://codereview.chromium.org/488012
------------------------------------------------------------------------

Aug 12, 2010
#9 bugdroid1@gmail.com
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=55876 

------------------------------------------------------------------------
r55876 | timurrrr@chromium.org | 2010-08-12 04:52:06 -0700 (Thu, 12 Aug 2010) | 4 lines
Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/tools/valgrind/tsan/suppressions.txt?r1=55876&r2=55875

Suppress data race in UI ( bug 51890 ) and remove a suppression for a race fixed long ago (25915)
BUG=51890,25915
TBR=glider
Review URL: http://codereview.chromium.org/3130012
------------------------------------------------------------------------

Oct 5, 2010
#10 timurrrr@chromium.org
Please disregard this comment - the issue is still Fixed :-)

Just setting Owner: fields to TSan crbugs with no owner.
Owner: t...@chromium.org
Oct 12, 2012
#11 bugdro...@chromium.org
This issue has been closed for some time. No one will pay attention to new comments.
If you are seeing this bug or have new data, please click New Issue to start a new bug.
Labels: Restrict-AddIssueComment-Commit
Mar 11, 2013
#12 bugdro...@chromium.org
(No comment was entered for this change.)
Labels: -Feature-Sync Cr-Services-Sync
Sign in to add a comment

Powered by Google Project Hosting