My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 150568: chrome crash: ExtensionAppProvider::RefreshAppList
5 people starred this issue and may be notified of changes. Back to list
 
Project Member Reported by ellyjo...@chromium.org, Sep 18, 2012
Operating system: Linux
                  0.0.0 Linux 3.4.0 #1 SMP Mon Sep 17 22:43:08 PDT 2012 i686
CPU: x86
     GenuineIntel family 6 model 2 stepping 3
     4 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 0 (crashed)
 0  chrome!ExtensionAppProvider::RefreshAppList [atomicity.h : 48 + 0x5]
    eip = 0x764b8819   esp = 0x7fe07f00   ebp = 0x7fe07fc8   ebx = 0x77922ff4
    esi = 0x796e6318   edi = 0x77955b4c   eax = 0x00000000   ecx = 0xffffffff
    edx = 0xfffffff4   efl = 0x00210202
    Found by: given as instruction pointer in context
 1  chrome!NotificationServiceImpl::Notify [notification_service_impl.cc : 127 + 0x24]
    eip = 0x75a3f4fc   esp = 0x7fe07fd0   ebp = 0x7fe08328   ebx = 0x77922ff4
    esi = 0x00000013   edi = 0x00000086
    Found by: call frame info
 2  chrome!ExtensionService::NotifyExtensionLoaded [extension_service.cc : 989 + 0x1d]
    eip = 0x75bcc2fa   esp = 0x7fe08330   ebp = 0x7fe08568   ebx = 0x77922ff4
    esi = 0x7993ac00   edi = 0x7796946c
    Found by: call frame info
 3  chrome!ExtensionService::AddExtension [extension_service.cc : 1974 + 0xb]
    eip = 0x75bc8a52   esp = 0x7fe08570   ebp = 0x7fe085c8   ebx = 0x77922ff4
    esi = 0x7993ac00   edi = 0x7b1cd200
    Found by: call frame info
 4  chrome!ExtensionService::OnExtensionInstalled [extension_service.cc : 2224 + 0xb]
    eip = 0x747e890b   esp = 0x7fe085d0   ebp = 0x7fe08738   ebx = 0x77922ff4
    esi = 0x7b1cd200   edi = 0x7993ac00
    Found by: call frame info
 5  chrome!extensions::CrxInstaller::ReportSuccessFromUIThread [crx_installer.cc : 634 + 0x1d]
    eip = 0x747b803f   esp = 0x7fe08740   ebp = 0x7fe08798   ebx = 0x77922ff4
    esi = 0x7b096d80   edi = 0x7993ac00
    Found by: call frame info
 6  chrome!base::internal::Invoker<1, base::internal::BindState<base::internal::RunnableAdapter<void (extensions::CrxInstaller::*)()>, void(extensions::CrxInstaller*), void(extensions::CrxInstaller*)>, void(extensions::CrxInstaller*)>::Run [bind_internal.h : 134 + 0x4]
    eip = 0x747b559f   esp = 0x7fe087a0   ebp = 0x7fe087b8   ebx = 0x77922ff4
    esi = 0x77955864   edi = 0x7fe0882c
    Found by: call frame info
 7  chrome!MessageLoop::RunTask [callback.h : 389 + 0xe]
    eip = 0x75a0a47c   esp = 0x7fe087c0   ebp = 0x7fe08898   ebx = 0x77922ff4
    esi = 0x77955864   edi = 0x7fe0882c
    Found by: call frame info
 8  chrome!MessageLoop::DeferOrRunPendingTask [message_loop.cc : 472 + 0xb]
    eip = 0x75a0a36c   esp = 0x7fe088a0   ebp = 0x7fe088d8   ebx = 0x77922ff4
    esi = 0x7fe08908   edi = 0x7fe08908
    Found by: call frame info
 9  chrome!MessageLoop::DoWork [message_loop.cc : 648 + 0x4]
    eip = 0x759dff08   esp = 0x7fe088e0   ebp = 0x7fe08958   ebx = 0x77922ff4
    esi = 0x796b3500   edi = 0x7fe08908
    Found by: call frame info
10  chrome!base::MessagePumpGlib::HandleDispatch [message_pump_glib.cc : 268 + 0xc]
    eip = 0x75d34dbf   esp = 0x7fe08960   ebp = 0x7fe08978   ebx = 0x72b24ff4
    esi = 0x796b2900   edi = 0x7fe089f4
    Found by: call frame info
11  chrome!WorkSourceDispatch [message_pump_glib.cc : 105 + 0xd]
    eip = 0x75d34d95   esp = 0x7fe08980   ebp = 0x7fe08998   ebx = 0x72b24ff4
    esi = 0x796cffc0   edi = 0x7fe089f4
    Found by: call frame info
12  libglib-2.0.so.0.3000.2!g_main_context_dispatch [gmain.c : 2441 + 0x9]
    eip = 0x72a5cfcd   esp = 0x7fe089a0   ebp = 0x7fe08a18   ebx = 0x72b24ff4
    esi = 0x796cffc0   edi = 0x7fe089f4
    Found by: call frame info
13  libglib-2.0.so.0.3000.2!g_main_context_iterate [gmain.c : 3089 + 0xc]
    eip = 0x72a5d565   esp = 0x7fe08a20   ebp = 0x7fe08a88   ebx = 0x72b24ff4
    esi = 0x796ad7e0   edi = 0x00000001
    Found by: call frame info
14  libglib-2.0.so.0.3000.2!g_main_context_iteration [gmain.c : 3152 + 0x15]
    eip = 0x72a5d928   esp = 0x7fe08a90   ebp = 0x7fe08ac8   ebx = 0x72b24ff4
    esi = 0x796ad7e0   edi = 0x72b25878
    Found by: call frame info
15  chrome!base::MessagePumpGlib::RunWithDispatcher [message_pump_glib.cc : 199 + 0x11]
    eip = 0x75d32515   esp = 0x7fe08ad0   ebp = 0x7fe08b28   ebx = 0x77922ff4
    esi = 0x796b2900   edi = 0x00000000
    Found by: call frame info
16  chrome!base::MessagePumpGlib::Run [message_pump_glib.cc : 291 + 0x16]
    eip = 0x75d32491   esp = 0x7fe08b30   ebp = 0x7fe08b48   ebx = 0x77922ff4
    esi = 0x796b3500   edi = 0x00000000
    Found by: call frame info
17  chrome!MessageLoop::RunInternal [message_loop.cc : 419 + 0xa]
    eip = 0x759dfb14   esp = 0x7fe08b50   ebp = 0x7fe08b68   ebx = 0x77922ff4
    esi = 0x796b3500   edi = 0x00000000
    Found by: call frame info
18  chrome!base::RunLoop::Run [run_loop.cc : 45 + 0x9]
    eip = 0x759dfa26   esp = 0x7fe08b70   ebp = 0x7fe08b88   ebx = 0x77922ff4
    esi = 0x7fe08bc0   edi = 0x00000000
    Found by: call frame info
19  chrome!ChromeBrowserMainParts::MainMessageLoopRun [chrome_browser_main.cc : 1461 + 0x7]
    eip = 0x75d32405   esp = 0x7fe08b90   ebp = 0x7fe08bf8   ebx = 0x77922ff4
    esi = 0x7fe08bc0   edi = 0x00000000
    Found by: call frame info
20  chrome!content::BrowserMainLoop::RunMainMessageLoopParts [browser_main_loop.cc : 479 + 0xe]
    eip = 0x75d322ff   esp = 0x7fe08c00   ebp = 0x7fe08c38   ebx = 0x77922ff4
    esi = 0x796afea0   edi = 0x76909d14
    Found by: call frame info
21  chrome!BrowserMainRunnerImpl::Run [browser_main_runner.cc : 111 + 0xa]
    eip = 0x75d32286   esp = 0x7fe08c40   ebp = 0x7fe08c58   ebx = 0x77922ff4
    esi = 0x7969ed40   edi = 0x76909bef
    Found by: call frame info
22  chrome!BrowserMain [browser_main.cc : 21 + 0x7]
    eip = 0x759cfee0   esp = 0x7fe08c60   ebp = 0x7fe08c98   ebx = 0x77922ff4
    esi = 0x7969ed40   edi = 0x76909bef
    Found by: call frame info
23  chrome!content::RunNamedProcessTypeMain [content_main_runner.cc : 437 + 0xc]
    eip = 0x759cfdb1   esp = 0x7fe08ca0   ebp = 0x7fe08cd8   ebx = 0x77922ff4
    esi = 0x00000000   edi = 0x7fe08d0c
    Found by: call frame info
24  chrome!content::ContentMainRunnerImpl::Run [content_main_runner.cc : 718 + 0x15]
    eip = 0x759cfc47   esp = 0x7fe08ce0   ebp = 0x7fe08d38   ebx = 0x77922ff4
    esi = 0x796a9f60   edi = 0x7fe08d0c
    Found by: call frame info
25  chrome!content::ContentMain [content_main.cc : 35 + 0x7]
    eip = 0x759c35c8   esp = 0x7fe08d40   ebp = 0x7fe08d68   ebx = 0x77922ff4
    esi = 0x796a9f60   edi = 0xffffffff
    Found by: call frame info
26  chrome!ChromeMain [chrome_main.cc : 32 + 0x12]
    eip = 0x759c349e   esp = 0x7fe08d70   ebp = 0x7fe08dc8   ebx = 0x77922ff4
    esi = 0x7fe08d90   edi = 0x00000035
    Found by: call frame info
27  chrome!main [chrome_exe_main_aura.cc : 17 + 0x11]
    eip = 0x72f807d8   esp = 0x7fe08dd0   ebp = 0x7fe08de8   ebx = 0x77922ff4
    esi = 0x00000000   edi = 0x00000000
    Found by: call frame info
28  libc-2.15.so!__libc_start_main [libc-start.c : 234 + 0x1b]
    eip = 0x7216f447   esp = 0x7fe08df0   ebp = 0x7fe08e58   ebx = 0x722d7e4c
    esi = 0x00000000   edi = 0x00000000
    Found by: call frame info
29  chrome + 0x2a56a0
    eip = 0x72f806a1   esp = 0x7fe08e60   ebp = 0x00000000   ebx = 0x77922ff4
    esi = 0x00000035   edi = 0x72f80670
    Found by: call frame info
30  chrome + 0x2a57af
    eip = 0x72f807b0   esp = 0x7fe08e64   ebp = 0x00000000
    Found by: stack scanning
31  ld-2.15.so + 0xee6f
    eip = 0x72cc9e70   esp = 0x7fe08e78   ebp = 0x00000000
    Found by: stack scanning

Observed on alex-canary: <https://uberchromegw.corp.google.com/i/chromeos/builders/x86-alex%20canary/builds/2586>; manifested as "FAIL: Unhandled JSONInterfaceError: Chrome automation failed prior to timing out, did chrome crash?"
Sep 18, 2012
#2 zelidrag@chromium.org
(No comment was entered for this change.)
Owner: hshi@chromium.org
Labels: Mstone-23 Feature-Extensions
Sep 18, 2012
#3 hshi@chromium.org
(No comment was entered for this change.)
Status: Started
Sep 21, 2012
#4 hshi@chromium.org
The crash is at a thread-safe exchange-and-increment instruction

 37fb344:	b9 ff ff ff ff       	mov    $0xffffffff,%ecx
 37fb349:	f0 0f c1 48 fc       	lock xadd %ecx,-0x4(%eax)    <--- CRASH
 37fb34e:	89 8d 48 ff ff ff    	mov    %ecx,-0xb8(%ebp)

It corresponds to the ExtensionSet::const_iterator::operator ++() in ExtensionAppProvider::RefreshAppList().
Sep 21, 2012
#5 hshi@chromium.org
This is really puzzling. The ExtensionSet* is coming from ExtensionService class. The atomic iterator operator ++ should not fail (if the map itself were invalid, we would have crashed at the first iteration of the for loop where the extensions.begin() was dereferenced.

The only way for this to crash is if there is another thread manipulating the same ExtensionSet object in parallel. At first glance, most of the functions in the ExtensionsService class are guarded with run-time CHECKs to ensure they're only invoked on the UI thread... I'll dig further to see what I've missed.

CC+ a few extensions folks in case it rings a bell.
Cc: a...@chromium.org mihaip@chromium.org mpcomplete@chromium.org
Sep 21, 2012
#6 mpcomplete@chromium.org
This sort of this can happen if the ExtensionSet is modified (as in, an entry is added/removed) within the for-loop. Though, inspecting the for-loop, it doesn't seem possible.
Sep 21, 2012
#7 hshi@chromium.org
There are a number of places in extension_service.cc where UnloadExtension() is invoked; any of these could have removed entries and caused this problem. For example ExtensionUpdater::OnBlacklistDownloadFinished() -> ExtensionService::UpdateExtensionBlacklist() -> CheckManagementPolicy() has calls to UnloadExtension(), and I don't see any UI-thread checks along the path.
Sep 26, 2012
#8 b...@chromium.org
 Issue 151855  has been merged into this issue.
Oct 1, 2012
#9 zelidrag@chromium.org
(No comment was entered for this change.)
Labels: Iteration-66
Oct 2, 2012
#13 hshi@chromium.org
I finally found a good instance on crash server that has complete callstacks for both the crashing thread and another thread stuck in a file I/O operation that also involves extensions. The link is at https://crash.corp.google.com/reportdetail?reportid=c9b226e46687cdf6

Oct 2, 2012
#14 hshi@chromium.org
Another instance (on mario): https://crash.corp.google.com/reportdetail?reportid=52d5b1f8a436c1a4

The instances have the following in common:
(1) Crash in UI thread called from CrxInstaller::ReportSuccessFromUIThread();
(2) FILE thread has a pending file I/O operation called from CrxInstaller::InstallExtension().

Note that the (1) is a posted task called from the FILE thread CrxInstaller::ReportSuccessFromFileThread(). It seems we have multiple CRX installed in quick succession, and somehow the ReportSuccessFromUIThread() posted for the previous CRX got invoked in the UI thread while the FILE thread is working on the next CRX.
Oct 2, 2012
#15 hshi@chromium.org
Still no clue how we end up crashing in the atomic increment. I initially suspected that the PostTask to ReportSuccessFromUIThread somehow got executed after the CrxInstaller object got deleted, but after reading the code back and forth it seems all the refcount bookkeeping are correct. The fact that multiple CRX are installed in quick succession (or even concurrently) still should not have caused this, as all access to the extension service are serialized in the UI thread.
Oct 3, 2012
#16 hshi@chromium.org
Interestingly, all the crash reports so far are reported for 32-bit x86 (i686) arch. This includes zgb, mario, alex, parrot and butterfly.
Oct 3, 2012
#17 hshi@chromium.org
After re-reading the assembly surrounding the LOCK XADD instruction, I want to correct my comment #4. Now it no longer looks like the increment of the ExtensionSet::const_iterator, but rather an inlined call to extension_apps_.clear().

Oct 3, 2012
#18 hshi@chromium.org
The crash instances (see comments #13, #14) are the only two I can find on https://crash.corp.google.com. The minidumps of these show both are run on QEMU VM, so they are probably from the same failing VMTest instances reported by ellyjones@ and mazda@.

I've so far been unable to reproduce this on a real device using the images in these crash reports.
Oct 3, 2012
#19 hshi@chromium.org
@xiyuan suggested that I add some debug logs and try bot cros-x86. So far I already tried twice but neither crashed (http://codereview.chromium.org/11013039/). If we do see a crash, hopefully the log will tell if we have a use-after-free problem with the ExtensionAppProvider class.
Cc: xiyuan@chromium.org
Oct 4, 2012
#20 zelidrag@chromium.org
i don't see this crash anywhere in M23 branch
Labels: -Mstone-23 Mstone-24
Oct 4, 2012
#21 kev...@chromium.org
Seeing this crash in VMTests on Alex-x86 PFQ.
Oct 4, 2012
#22 hshi@chromium.org
Yes this is still happening: http://chromegw/i/chromeos/waterfall?builder=alex%20nightly%20chrome%20PFQ

Looks like there may be a bug with try-bots; all my try jobs seem to pass and I don't get any logs even though I added them for both the dtor and the RefreshAppList() calls. 
Oct 4, 2012
#23 hshi@chromium.org
(As suggested by kevers@): here's a more direct link to the PFQ status: http://chromegw/i/chromeos/builders/alex%20nightly%20chrome%20PFQ
Oct 4, 2012
#24 xiyuan@chromium.org
Or here:

http://chromegw.corp.google.com/i/chromiumos.release/waterfall

This is probably the top crash there.
Oct 4, 2012
#25 hshi@chromium.org
Among all alex nightly PFQ failures, the first one seems to be http://chromegw/i/chromeos/builders/alex%20nightly%20chrome%20PFQ/builds/808 (9/15 4:23, assuming this is PDT). Before that a consecutive 20+ builds all passed as early as 9/10. Unfortunately #808 did not have a stacktrace, and the VMTest message says it is a timeout. There is another similar one (#811 on 9/16) which is also a timeout. 

The first VMTest fail with the correct stacktrace is #817 (9/17 16:01) which is roughly the same time as Elly's report.
Oct 4, 2012
#26 hshi@chromium.org
Ok, the reason why all my virtual try-jobs are passing is because they're all run with board=x86-generic, which doesn't have this problem (see http://chromegw/i/chromeos/builders/x86-generic%20nightly%20chromium%20PFQ 

How do I submit try jobs specifically for x86-alex? It would be best if debugging this does not involve landing a CL to the ToT with logging messages.
Cc: kev...@chromium.org
Oct 4, 2012
#27 hshi@chromium.org
The first VMTest failure in the x86-mario canary PFQ is at #2576 (9/19 10:30 at 157560). This is two days later than the first failure in alex PFQ (#816: 9/17 16:01 at 157235).

I still couldn't figure out how the ExtensionAppProvider class could get used after free. Not only is this one ref-counted and thread-safe, but as far as I can see from debugging, all its methods are invoked on the UI thread, and the dtor automatically un-registers itself as the listener. So it would seem that RefreshAppList could not have been called after the object is freed, if the assumption is true that all task on the UI thread are serially executed.

Regarding the change log: between 9/10 and 9/17-19 there is really not much change in the chrome/browser/extensions and chrome/browser/autocomplete. The only thing that stands out is a re-org on 9/11 (https://chromiumcodereview.appspot.com/10933030); it re-orders the providers_ vector and moves the position of ExtensionAppProvider entry closer to the beginning of the list -- which means the object will be released earlier when AutocompleteController is released. But none of this should matter because the entire AutocompleteController dtor is on UI thread and it will block the CrxInstaller::ReportSuccessFromUIThread.
Oct 5, 2012
#28 hshi@chromium.org
I want to correct my comments #4 and #17 regarding the nature of the atomic decrement. Instead of guessing what it is, I picked a failing PFQ build (http://chromegw/i/chromeos/builders/alex%20nightly%20chrome%20PFQ/builds/925 put it on an Alex and dumped the disassembly around the %eip (0x76555B79 = chrome+0x03892B79):

 3892b61:	e9 3b fd ff ff       	jmp    38928a1
 3892b66:	8b 8b 04 83 ff ff    	mov    -0x7cfc(%ebx),%ecx
 3892b6c:	85 c9                	test   %ecx,%ecx
 3892b6e:	0f 84 9e 00 00 00    	je     3892c12
 3892b74:	b9 ff ff ff ff       	mov    $0xffffffff,%ecx
 3892b79:	f0 0f c1 48 fc       	lock xadd %ecx,-0x4(%eax)   <------ atomic decrement that crashed
 3892b7e:	89 8d 48 ff ff ff    	mov    %ecx,-0xb8(%ebp)
 3892b84:	8b 85 48 ff ff ff    	mov    -0xb8(%ebp),%eax
 3892b8a:	85 c0                	test   %eax,%eax
 3892b8c:	0f 8f 02 fd ff ff    	jg     3892894
 3892b92:	8d 85 6c ff ff ff    	lea    -0x94(%ebp),%eax
 3892b98:	89 14 24             	mov    %edx,(%esp)
 3892b9b:	89 44 24 04          	mov    %eax,0x4(%esp)
 3892b9f:	e8 ec a2 55 ff       	call   2dece90
 3892ba4:	e9 eb fc ff ff       	jmp    3892894

Comparing the above snippet with the g++ output of std::vector<struct ExtensionApp>::clear() (using -O2 -S -m32 option), it is clear that the above LOCK XADD is related to the dtor of the 1st string16 object |name| in struct ExtensionApp; the %eax here points to the wchar array holding the string16 |name|.
Oct 5, 2012
#29 dd...@chromium.org
Per Kevers, he believes this to be contributing to intermittent failures that prevent Chrome on CrOS from revving to the latest version. While the crash itself has been intermittently happening, the rev problem has been consistent for over a week now (it does rev versions, but takes 2-3 build attempts to get that to happen). Thus marking as a blocker.
Labels: ReleaseBlock-Dev
Oct 5, 2012
#30 hshi@chromium.org
 Issue 151566  has been merged into this issue.
Cc: derat@chromium.org stevenjb@chromium.org benhenry@chromium.org zelidrag@chromium.org hshi@chromium.org
Oct 5, 2012
#31 hshi@chromium.org
Seen with R23 builders too.
Labels: -Mstone-24 Mstone-23
Oct 5, 2012
#32 hshi@chromium.org
I found one of the reasons why trying to repro this locally fails: the VMTest always starts with the pristine image, so upon first log-in, all the default CRX files are unpacked and installed (as shown on the call stack). Only at this time do we get a chance to invoke a bunch of CrxInstaller instances in quick succession.

But from the 2nd log-in onwards, the expanded files are already unpacked under $HOME/Extensions/ and this path will no longer be exercised.

In order to repro this on a device, I'll need to start from a clean image (uninitialized stateful partition) with debug loggings added, and repeated try first-time log-in.
Oct 5, 2012
#33 hshi@chromium.org
Even with starting from a clean image, I could never reproduce the crash in ExtensionAppProvider::RefreshAppList exactly like those reported on VMTest. However, I do see crashes upon 1st log-in if I do not let all CRX installation to complete and quickly log out. The crash is a SIGABRT and is totally different, unfortunately (see https://crash.corp.google.com/reportdetail?reportid=883f4d16ca73db24)
Oct 5, 2012
#34 hshi@chromium.org
ddraw@, kevers@ -- Is it possible to try a CL (with debug logging) on one of the failing PFQ slaves? Since this cannot be reproduced locally (even with the same image) and the cros_x86 try-bot and the generic-x86 VMTests do not have this issue, the only way I can think of to make progress is to get debug logs on the slaves with this problem.

xiyuan@ actually suggested me to just add debug logs and land it to the ToT, but I'm hesitant because: (1) this appears to be very timing-sensitive and I need to add many lines of debug logs (for example upon every ctor/dtor and RefreshAppList calls of the ExtensionAppProvider class); it shouldn't land in production. (2) having debug logs reviewed / landed then wait for the next nightly seems too slow and we need a faster feedback cycle.
Oct 8, 2012
#35 benhe...@google.com
We have no more devs for 23.
Labels: -ReleaseBlock-Dev ReleaseBlock-Beta
Oct 8, 2012
#36 kar...@google.com
(No comment was entered for this change.)
Labels: OS-Chrome
Oct 8, 2012
#37 hshi@chromium.org
One more instance with the x86-mario R23 build: this time the CrxInstaller is not involved on the FILE thread callstack (https://storage.cloud.google.com/chromeos-image-archive/x86-mario-release/R23-2913.68.0/chrome.20121008.061635.20901.dmp.txt)
Oct 8, 2012
#38 hshi@chromium.org
It is worth noting that, while this particular crash signature is unique to the QEMU KVM test with 32-bit x86 architecture, we're also seeing VMTest crashes on amd64 architectures (for example http://chromegw.corp.google.com/i/chromiumos.release/builders/lumpy%20full%20release-R23-2913.B/builds/41). For these the crash is also a SIGSEGV at address = 0x0 but the callstack is completely different.
Oct 9, 2012
#40 hshi@chromium.org
Thanks to davidjames@ I'm finally able to see a repro of this on tryserver. See http://chromegw.corp.google.com/i/chromiumos.tryserver/builders/x86-mario-factory/builds/1

In the debug logs I do not see any instances of use-after-free. However I do see that the VMTest logs out almost immediately after logging in, such that the extensions service has not yet finished installing all the discovered CRX file when chrome receives a SIGTERM.
Oct 9, 2012
#41 hshi@chromium.org
Got another VMTest crash on tryserver (http://chromegw.corp.google.com/i/chromiumos.tryserver/builders/x86-alex-release/builds/529)

Both crash are identical. The RefreshAppList() call corresponding to the "this" pointer (ExtensionAppProvider) on the stack trace is seen in the debug log; no destructor has been called. So this confirms that it is not a free-after-call case. For example in the alex crash:

...
[6452:6452:1009/133351:ERROR:extension_app_provider.cc(190)] !@!@ ExtensionAppProvider observe 0x78f9f100
[6452:6452:1009/133351:ERROR:extension_app_provider.cc(190)] !@!@ ExtensionAppProvider observe 0x784c06c0
[6452:6452:1009/133351:ERROR:extension_app_provider.cc(190)] !@!@ ExtensionAppProvider observe 0x78f9f100
[6452:6452:1009/133351:ERROR:extension_app_provider.cc(190)] !@!@ ExtensionAppProvider observe 0x784c06c0
(crashed - stacktrace shows this == 0x784c06c0)

In the crash above, there has been no logs showing the ExtensionAppProvider object has been released.

Oct 9, 2012
#42 hshi@chromium.org
This now looks like a memory corruption problem. Issue 151666 is the same cause, only difference is that it is a lot more likely to hit this while the CRX installation is still not finished (in RefreshAppList) than at shutdown (in ~ExtensionAppProvider). 

Unfortunately, the asan tryservers appear to have problem unrelated to this bug, such that builds are always failing (see http://chromegw.corp.google.com/i/chromiumos.tryserver/builders/x86-generic-asan)
Oct 9, 2012
#43 hshi@chromium.org
Added more debug logs and got the failure again: http://chromegw.corp.google.com/i/chromiumos.tryserver/builders/x86-mario-factory/builds/2

The log confirms that the last RefreshAppList() call prior to the crash did not complete, and the "this" pointer matches the stack trace. What's surprising in the log printout is that this->extension_apps_ is still accessible (the extension_apps_.size() value is correct), which means the memory for std::vector<ExtensionApp> is not overwritten. What's overwritten is the actual buffer holding the individual struct ExtensionApp entries.

This makes it virtually impossible to guess how the memory corruption happened because the memory for the buffer could be anywhere; it isn't even fixed for the lifetime of the ExtensionAppProvider object due to the stl vector implementation, as more entries are push_back'ed to the vector the buffer is moved multiple times to increase capacity.
Oct 9, 2012
#44 hshi@chromium.org
Got a VMTest failure with asan tryserver. The log shows a heap-use-after-free error.

==18749== ERROR: AddressSanitizer heap-use-after-free on address 0x558c889c at pc 0x72f4abcc bp 0x7fe18a58 sp 0x7fe18a50
WRITE of size 4 at 0x558c889c thread T0
    #0 0x72f4abcc (/opt/google/chrome/chrome+0xa5d1bcc)
    #1 0x72f4b025 (/opt/google/chrome/chrome+0xa5d2025)
    #2 0x72f4b168 (/opt/google/chrome/chrome+0xa5d2168)
    #3 0x7060bf1d (/opt/google/chrome/chrome+0x7c92f1d)

Oct 9, 2012
#45 hshi@chromium.org
Regarding error in #44: I extracted the debug symbols from the tryserver artifacts and got the stacktrace:

Heap-use-after-free at
#0 LoginEventObserver::_NotifyLoginEvent()
#0 LoginEventObserver::Observe()
#2 NotificationServiceImpl::Notify()

Object freed at
BaseLoginDisplayHost::~BaseLoginDisplayHost()

Object originally allocated at
BaseLoginDisplayHost::StartSigninScreen()

Based on the above, the object is member chromeos::ExistingUserController controller_ of class LoginEventObserver. The observe call was added in https://chromiumcodereview.appspot.com/10912142/. 
Oct 9, 2012
#46 hshi@chromium.org
+ nirnimesh@ who landed the original CL (http://www.crrev.com/155926)

Could you review my fix at http://codereview.chromium.org/11088032/? I think it is not safe to use the member controller_ from inside LoginEventObserver because the controller could be released shortly after log-in (as seen with the VMTests)
Cc: nirnimesh@chromium.org
Oct 9, 2012
#47 hshi@chromium.org
It looks like this is finally root-caused. With the fix I'm seeing VMTests repeatedly passing on the ASAN tryserver (http://chromegw.corp.google.com/i/chromiumos.tryserver/builders/x86-generic-asan/builds/89/steps/VMTest/logs/stdio)
Oct 9, 2012
#48 bugdro...@chromium.org
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=161049

------------------------------------------------------------------------
r161049 | hshi@google.com | 2012-10-10T05:21:16.728275Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/automation/testing_automation_provider_chromeos.cc?r1=161049&r2=161048&pathrev=161049
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/automation/automation_event_observers_chromeos.cc?r1=161049&r2=161048&pathrev=161049
   M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/automation/automation_event_observers.h?r1=161049&r2=161048&pathrev=161049

[chromeos] Fix a use-after-free problem with LoginEventObserver

Always use the static function ExistingUserController::current_controller() to obtain the pointer to
the current user controller object. This is to prevent heap corruption when LoginEventObserver::Observe()
is called after the controller is already freed.

BUG=150568
TEST=tryserver

Review URL: https://codereview.chromium.org/11088032
------------------------------------------------------------------------
Oct 9, 2012
#49 hshi@chromium.org
Request merge 161049 to R23.1271.
Cc: kar...@google.com
Labels: Merge-Requested
Oct 9, 2012
#50 mihaip@chromium.org
(No comment was entered for this change.)
Cc: -mihaip@chromium.org
Oct 10, 2012
#51 hshi@chromium.org
Removed Extensions label. This is an automation test issue. The crash is introduced by http://crrev.com/155926.
Cc: -a...@chromium.org -mpcomplete@chromium.org -hshi@chromium.org
Labels: -Type-Bug -Area-UI -Feature-Extensions Type-Regression Tests Crash
Oct 10, 2012
#52 xiyuan@chromium.org
Good work!
Oct 10, 2012
#53 kar...@google.com
(No comment was entered for this change.)
Labels: -Merge-Requested Merge-Rejected
Oct 10, 2012
#54 kar...@google.com
(No comment was entered for this change.)
Labels: -ReleaseBlock-Beta
Oct 10, 2012
#55 kar...@google.com
(No comment was entered for this change.)
Labels: -Merge-Rejected Merge-Approved
Oct 10, 2012
#56 bugdro...@chromium.org
The following revision refers to this bug:
    http://src.chromium.org/viewvc/chrome?view=rev&revision=161188

------------------------------------------------------------------------
r161188 | hshi@google.com | 2012-10-10T21:12:13.880617Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/branches/1271/src/chrome/browser/automation/testing_automation_provider_chromeos.cc?r1=161188&r2=161187&pathrev=161188
   M http://src.chromium.org/viewvc/chrome/branches/1271/src/chrome/browser/automation/automation_event_observers_chromeos.cc?r1=161188&r2=161187&pathrev=161188
   M http://src.chromium.org/viewvc/chrome/branches/1271/src/chrome/browser/automation/automation_event_observers.h?r1=161188&r2=161187&pathrev=161188

Merge 161049 - [chromeos] Fix a use-after-free problem with LoginEventObserver

Always use the static function ExistingUserController::current_controller() to obtain the pointer to
the current user controller object. This is to prevent heap corruption when LoginEventObserver::Observe()
is called after the controller is already freed.

BUG=150568
TEST=tryserver

Review URL: https://codereview.chromium.org/11088032

TBR=hshi@google.com
Review URL: https://codereview.chromium.org/11092058
------------------------------------------------------------------------
Labels: -Merge-Approved merge-merged-1271
Oct 10, 2012
#57 hshi@chromium.org
(No comment was entered for this change.)
Status: Fixed
Oct 18, 2012
#58 abod...@chromium.org
(No comment was entered for this change.)
Status: Verified
Mar 9, 2013
#59 bugdro...@chromium.org
(No comment was entered for this change.)
Labels: -Type-Regression -Tests -TreeCloser -Mstone-23 -Crash Type-Bug-Regression Cr-Stability M-23 Hotlist-TreeCloser Cr-Tests
Sign in to add a comment

Powered by Google Project Hosting