| Issue 48709: | ZombieObjectCrash() relating to -mouseMoved:. | |
| 4 people starred this issue and may be notified of changes. | Back to list |
Restricted
Sign in to add a comment
|
Analyzing crash data from the zombie objects on dev channel, the most frequent items are related to tracking area messages. A couple examples: -[ToolbarController mouseMoved:] http://crash/reportdetail?reportid=0a01e983ac6ebf9a -[TabStripController mouseMoved:] http://crash/reportdetail?reportid=e48cbe7407368d90 -[BookmarkBarFolderController -mouseExitedButton:event:] http://crash/reportdetail?reportid=48d988ea50912c4d -[BookmarkBarFolderController -mouseEnteredButton:event:] http://crash/reportdetail?reportid=be4fc680f00ebff7 I've spent an hour or so looking into these, and AFAICT the right things are being done to remove the relevant tracking areas. But maybe another set of eyes would see more. For triage purposes, messages to zombies account for about 1/4 of our current browser crashes. The above four crashes account for perhaps 80% of those, plus there are a scattering of other zombie calls which could have the same derivation.
Jul 26, 2010
I looked at this and concur with Scott that it seems like we're doing everything right. My M6 list is pretty long still, so punting to Avi for another look. The real question is: why are we getting messages for tracking areas that are removed?
Owner:
a...@chromium.org
Cc: rse...@chromium.org
Aug 2, 2010
Been staring at the code and it looks right. Punting. :(
Labels:
-Mstone-6 Mstone-7
Aug 18, 2010
No change. :(
Labels:
-Mstone-7 Mstone-8
Aug 24, 2010
Issue 53204 has been merged into this issue.
Cc:
viettrun...@chromium.org tha...@chromium.org
Aug 24, 2010
Clearly something is going wrong, these are still showing up as the 2nd and 4th top browser crashes in 6.0 beta channel. M8 seems like the wrong milestone, though I'm not sure what to do about it either.
Cc:
pinker...@chromium.org
Labels: TopCrash
Aug 24, 2010
NSTrackingArea does not appear to retain it's |owner|, based on NSTrackingArea.h:
__weak id _owner;
Possible solution is to subclass NSTrackingArea and add |-[CrTrackingArea clearOwner]|.
Sep 28, 2010
I have an implementation of ScopedTrackingArea, but I wasn't happy with my results given the time I had allotted myself. Basically had a reset(view, area) method, so you (presumably) could not screw things up. Perhaps I should polish it up for review.
Oct 11, 2010
What is the status of this? It has gotten really bad in recent releases. Crashes in ZombieObjectCrash() account for a whopping 80% of crashes on the mac dev channel 7.0.544.0. In fact, mac Chrome now has more crashes per page load than windows Chrome. Some extra data for 7.0.544.0 in case it helps: * 86% of the crashes have StackSignature of "ZombieObjectCrash-7EE72E", which has: -[BubbleView drawRect:] * 19% of the crashes occur in under 1 second of browser launch, and 52% in under 10 seconds. For example in this crash the uptime is just 41ms: http://crash/reportdetail?reportid=0193b59eb33a6312 * The OS versions is almost entirely "10.6.4 10F569". Maybe this isn't relevant since this is the top OS versions in crash reports at large, however the clustering is even tighter for this case. Mac OS X 10.6.4 10F569 88.71% Mac OS X 10.5.8 9L31a 5.07% Mac OS X 10.5.8 9L30 1.41% Mac OS X 10.6.3 10D573 1.15% Mac OS X 10.6.5 10H542 0.49% Mac OS X 10.6.4 10F2056 0.44% * Every single crash happens on thread 0 (main thread), with the exception of this one: http://crash/reportdetail?reportid=f5e69b41f012753e
Cc:
ero...@chromium.org
Oct 11, 2010
ZombieObjectCrash() is a catch-all for use-after-free; the object that is zombie is what differentiates reports. The BubbleView crash is issue 57600 . This bug is about a very specific set of crashes that involve mouse tracking areas.
Oct 11, 2010
Thanks rsesek, bug 57600 is indeed the one I should have posted to!
Oct 12, 2010
No clue. Punting but also unassigning so that others know it's available.
Status:
Available
Owner: --- Cc: a...@chromium.org Labels: -Mstone-8 Mstone-9
Oct 12, 2010
(No comment was entered for this change.)
Summary:
ZombieObjectCrash() relating to -mouseMoved:.
Oct 20, 2010
I have been able to pretty consistently get this crash to occur by following these steps: 1. Make a bookmark folder with 50 or more bookmarks. 2. Open all bookmarks in that folder using Open All Bookmarks. 3. Close the window before all tabs can complete loading. (I usually wait only 1 to 3 seconds before closing the window.) I'm believe http://crbug.com/47314 is a duplicate of this bug.
Oct 21, 2010
I'll go look at my ScopedTrackingArea CL this week and see if I can't put up something with potential on this. It's more of a hygiene change than a specific fix, though.
Status:
Started
Owner: sh...@chromium.org
Oct 26, 2010
(No comment was entered for this change.)
Labels:
-TopCrash Crash-TopCrash
Oct 26, 2010
(No comment was entered for this change.)
Labels:
-Crash-TopCrash Crash-TopCrasher
Nov 8, 2010
I spent some time with a ScopedTrackingArea implementation, then ... FAILURE! The entire idea would leave the tracking area to be autodestructed in the Objective-C runtime, by which point the view is torn down and you can't do fun things like remove tracking areas anymore. There might still be something salvageable in there, but it would probably involve hooking something else, like a notification of some sort. After perhaps far too long staring into the abyss, I have a really odd notion. I wonder if these could be getting forwarded down the responder chain somehow, and aren't actually a sign of a stale tracking area?
Nov 24, 2010
Could zombie notifications processes that I've reported in bug 64351 be contributing to your problem here?
Nov 24, 2010
Stephen: No, because a) those are zombie processes and these are zombie objects (different bloodsuckers) and b) this is Mac-only and that's a report on Windows. Thanks for your report though.
Nov 30, 2010
Moving all mstone:9 bugs that are not ReleaseBlockers to mstone:10
Labels:
-Mstone-9 Mstone-10 MovedFrom-9
Jan 26, 2011
Removing Crash-TopCrasher from bugs that haven't been updated since Nov or before.
Labels:
-Crash-TopCrasher
Jan 27, 2011
Move to M11 from M10, as we've now branched. If you believe this bug was moved in error, please come talk to me.
Labels:
-Mstone-10 Mstone-11 MovedFrom-10
Feb 9, 2011
At this point, this is basically the highest persistent browser crasher. Usually it's down a few notches in the dev channel, but works back to the top at things progress to beta and stable. I periodically take another run at it, but it really doesn't look like things are broken. AFAICT, we really are removing the relevant tracking areas before the owner is freed. I've not been able to think up an edge case where things are going awry. So ... let's pretend that there is a case where the tracking area is being pinned somehow, even though the owner removed the tracking area. How can that case be exposed? One idea I had was to subclass NSTrackingArea, and add some new command to invalidate the owner. This need not use undocumented magic - it could just snag the real owner in -init* and wrap it with a proxy in the super call, and have the proxy stop forwarding things. [Or go undocumented and add a category method to null out _owner, or use object_setInstanceVariable().] Something nice about doing it above-board is that instead of dropping the message entirely, the case could be detected and additional debugging info could be sent to breakpad before -[NSTrackingArea mouseMoved:] is called. F/i, -clearOwner could capture a backtrace() and encode that somehow when -mouseMoved: is sent. Or debugging information for purposes of reproducing could be logged, but it's hard to constrain what might be useful. Any thoughts?
Cc:
m...@chromium.org
Feb 9, 2011
A while ago I wrote up a patch that created CrTrackingArea and |-clearOwner|, as mentioned in comment 7. I could dig that up, do some rebasing (from Aug!), and send it out for review tomorrow. Scott: In comment 18 you mentioned the responder chain, did you investigate around that at all?
Feb 9, 2011
Robert, we can go either way on the actual implementation - rebasing will probably suck because all the files have moved, and someone will have to audit for new uses anyhow, so I can as easily go forward with my code (well, I'll have to convert MyTrackingArea to CrTrackingArea, and write a unit test or something :-). AFAICT, no responder chain involved in the normal case. -[NSTrackingArea NSTrackingArea mouseMoved:] seems to call -owner twice, then -[owner mouseMoved:]. FramedBrowserWindow, ReloadButton, and HoverCloseButton are all NSResponder subclasses which register as owner of an NSTrackingArea, but do not override -mouseMoved:, and do not include NSTrackingMouseMoved. Also, AFAICT from reading the disassembly, -[NSResponder mouseMoved:] would remain on the stack if it were happening.
Feb 10, 2011
It also occurred to me that while we remove the tracking areas correctly, we do so in |-dealloc|. Since the BWC is autoreleased and cleaned up off the run loop, by the time the tracking areas are removed, we're already well into the teardown process. I put together CrTrackingArea that can call |-clearOwner| when it receives an NSWindowWillCloseNotification for a specific window. That may help with this problem, too.
Feb 14, 2011
There will be some additional code for debugging purposes for M-11, but I'm going to dissociate this from a specific milestone because priority isn't really what's blocking this bug, it's blocked by having any idea how to flush it out better.
Labels:
-Mstone-11 Mstone-X
Feb 15, 2011
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=74940
------------------------------------------------------------------------
r74940 | rsesek@chromium.org | Tue Feb 15 04:50:35 PST 2011
Changed paths:
A http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area.mm?r1=74940&r2=74939&pathrev=74940
A http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area_unittest.mm?r1=74940&r2=74939&pathrev=74940
A http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area.h?r1=74940&r2=74939&pathrev=74940
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/chrome_browser.gypi?r1=74940&r2=74939&pathrev=74940
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tabs/tab_strip_controller.mm?r1=74940&r2=74939&pathrev=74940
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/chrome_tests.gypi?r1=74940&r2=74939&pathrev=74940
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tabs/tab_strip_controller.h?r1=74940&r2=74939&pathrev=74940
[Mac] Create CrTrackingArea and use it in TabStripController.
CrTrackingArea can prevent messages from being sent to the tracking area's
owner, which is the source of a significant number of zombie crashes. This hopes
to prevent that.
BUG=48709
TEST=Crash reports go down.
Review URL: http://codereview.chromium.org/6486002
------------------------------------------------------------------------
Feb 15, 2011
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=74945
------------------------------------------------------------------------
r74945 | rsesek@chromium.org | Tue Feb 15 06:22:22 PST 2011
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area.mm?r1=74945&r2=74944&pathrev=74945
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area_unittest.mm?r1=74945&r2=74944&pathrev=74945
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area.h?r1=74945&r2=74944&pathrev=74945
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tabs/tab_strip_controller.mm?r1=74945&r2=74944&pathrev=74945
[Mac] Placate Clang's (pedantic) warning about CrTrackingArea's initializer.
BUG=48709
TEST=Mac Clang compiles
Review URL: http://codereview.chromium.org/6480067
------------------------------------------------------------------------
Feb 15, 2011
CrTrackingArea has now landed on trunk, with a potential fix just for TabStripController. This week's dev channel should have it baked in, which means the next step is to watch the crash numbers. There should still be zombies for ToolbarController and BookmarkBarFolderController, but ZERO for TabStripController. If that's the case, then I'll apply this fix to destroy other hordes of these lifeless, brain-eating, browser-crashing objects.
Here's the numbers for 648.45:
zcount zname
43 TabStripController received -mouseMoved:
21 ToolbarController received -mouseMoved:
17 BookmarkBarFolderController received -mouseMoved:
As I discussed with Scott in the code review [1], I'm slightly suspicious of our teardown process in relation to the removal of tracking areas. The general flow is:
-[BrowserWindowController windowWillClose:] which does this:
// We can't actually use |-autorelease| here because there's an embedded
// run loop in the |-performClose:| which contains its own autorelease pool.
// Instead call it after a zero-length delay, which gets us back to the main
// event loop.
[self performSelector:@selector(autorelease)
withObject:nil
afterDelay:0];
When the main loop hits and the pool drains, we then do:
-[BrowserWindowController dealloc]
-[TabStripController dealloc]
-[[TabStripController view] removeTrackingArea:...]
-[NSTrackingArea dealloc] (via scoped_ptr<>)
I think what's happening is that between |-windowWillClose:| and the removal and destruction of the tracking areas, tracking messages can still be sent when the window is supposedly off-screen. These get queued up on the event queue and are dispatched after the autorelease happens. I have no proof of that actually happening, but from what I was seeing in some of our unit tests (more details in the code review), it makes me very curious. If that indeed is the problem, perhaps the right answer is to remove the tracking areas in |-windowWillClose:|, rather than in |-dealloc|.
[1] http://codereview.chromium.org/6486002
Owner:
rse...@chromium.org
Cc: -rse...@chromium.org sh...@chromium.org
Feb 15, 2011
Another thought: if in BWC we're already scheduling the cleanup to happen at some-point-not-now on the main run loop, is there any reason to use |-autorelease| instead of just |-release|? By the time the main loop hits to fire that delayed selector, BWC is off the stack and out of the |-performClose:| loop. I'm not sure switching to |-release| will actually buy us anything, though, because there's still that gap between window close and cleanup that I find worrisome.
Feb 15, 2011
Another option might be to short-circuit things at -[ChromeEventProcessingWindow sendEvent:] or -[FramedBrowserWindow sendEvent:] or -[CrApplication sendEvent:]. There are a small amount of these which crash in a bookmark menu, so the last would be the only one giving complete coverage. That approach may be too blunt, though. If it is the case that there just are deferred messages like this, though, then I think doing it by indirecting the owner messages is a stronger solution, really. I have sometimes wondered about building a generic disconnecting-proxy class for this kind of thing, with an adapter for disconnect-on-notification. It could also be useful for delegates and for targets. I've never moved on it because I find the notion kind of disgusting, like adding sleeps to fix race conditions. BTW, when I did AutocompletePopupViewMac I just let the view itself hold the owning reference to the tracking area, rather than the owner holding it. *cough*, well the owner is the view the tracking area was added to, so so that's not strictly true, but you see what I mean. With the window-close notification, this approach could possibly apply to many of our other uses.
Feb 16, 2011
We didn't make this week's dev channel *sigh*, but the hope is another one will go out on Tuesday next week. Scott: I agree with you that this approach is probably the best, rather than hooking in at the |-sendEvent:| level. I do like the idea of having the views own the actual tracking areas, though. However, I think depending on the instance, it may make the code slightly harder to follow (if the controller is receiving tracking messages for an object that its view owns).
Mar 1, 2011
New dev channel just went out. The numbers to beat from 11.0.672.2 are: zcount zname 76 TabStripController received -mouseMoved: 67 ToolbarController received -mouseMoved: 30 FramedBrowserWindow received -themeProvider 25 BookmarkBarFolderController received -mouseMoved: 12 NSPopUpButtonCell received -dismissPopUp 10 FramedBrowserWindow received -windowController
Mar 2, 2011
Version: 11.0.686
COUNT MESSAGE
5 Zombie <ToolbarController: *> received -mouseMoved:
3 Zombie <FramedBrowserWindow: *> received -themeProvider
and no TabStripController. Generally that case has been greater than ToolbarController crashes. More data would be a benefit, but I think this looks good.
Mar 2, 2011
Great! I'll put together a CL for ToolbarController and BookmarkBarFolderController.
Mar 3, 2011
Robert, let me know if you're underwater. I'd rather get this in in time for 11.0 than any other bug on my plate.
Mar 3, 2011
I put together a CL today (WIP; not ready for review) http://codereview.chromium.org/6612025/. It crashes some unit tests, so I need to figure out if it's an issue with the test or if I need to change my DCHECK-enforced invariant. I also hate the approach needed for ToolbarController. If you can think of a better way, please let me know :-).
Mar 3, 2011
I'm not even sure the window-close notification is necessary, really. It seems likely that the tracking area itself is fine, it's just the owner which is going away out-of-order, so as long as we break that link when the owner is deallocated, things should be fine.
Mar 3, 2011
Like maybe replacing scoped_nsobject<NSTrackingArea> with instances of a class like:
class ScopedCrTrackingArea {
public:
explicit ScopedCrTrackingArea(CrTrackingArea* tracking_area)
: tracking_area_(tracking_area) {
}
~ScopedCrTrackingArea() {
[tracking_area_ clearOwner];
}
id get() const {
return tracking_area_;
}
private:
scoped_nsobject<CrTrackingArea> tracking_area_;
DISALLOW_COPY_AND_ASSIGN(ScopedCrTrackingArea);
}
Mar 4, 2011
I'm not sure the window-close notification is necessary, either, but it is the safest way to unhook the owner. I'll integrate the scoper today because it's much better than my ugly notification hack.
Mar 4, 2011
I may not understand something about this, then. My understanding is that sometimes when the owner's -dealloc removes the tracking rect from the view, a later message can be sent from the tracking rect to the owner. So clearing the tracking-rect's owner in the owner's -dealloc does resolve the specific problem which is causing crashes. Clearing the tracking-rect's owner on window close may resolve additional problems, such as might occur when the owner gets a message from the tracking rect and finds things partially destroyed, but that's really a general problem which such owner's should resolve by watching for window close directly. Did that make sense?
Mar 4, 2011
Yes, that makes sense and I think you're right; the scoper should be sufficient. I went back through all my notes and I can't think of a situation that the scoper wouldn't cover (though boy would it be nice to have AppKit code to back that up). The only other thing I'm left wondering is: if the delayed message hypothesis (comment 31) is correct, couldn't the messages still be delivered to a zombie, even with this change? Because the CrTrackingArea is scoped to the life of the owner, if the messages are delivered after destruction and are carrying a weak reference to the tracking area or owner proxy, it could still be zombie. We haven't seen any cases of this, so it's probably not worth worrying about. I guess the answer to that depends on how the zombies were even happening in the first place, which we haven't been able to dissect.
Mar 4, 2011
Ah, that explains your window-watching code. AFAICT from experimentation and reading disassembly, the tracking area instances themselves must still be live at the point the zombie owner is being messaged. If the tracking area had been released, the zombie-message report would relate to that instance, and we don't see that at all. So I think that breaking the owner reference while the tracking area is known to be alive should suffice to fix these crashes.
Mar 4, 2011
The following revision refers to this bug:
http://src.chromium.org/viewvc/chrome?view=rev&revision=76976
------------------------------------------------------------------------
r76976 | rsesek@chromium.org | Fri Mar 04 13:45:27 PST 2011
Changed paths:
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/toolbar/toolbar_controller.mm?r1=76976&r2=76975&pathrev=76976
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area.mm?r1=76976&r2=76975&pathrev=76976
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area_unittest.mm?r1=76976&r2=76975&pathrev=76976
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/tracking_area.h?r1=76976&r2=76975&pathrev=76976
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/toolbar/toolbar_controller.h?r1=76976&r2=76975&pathrev=76976
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/bookmarks/bookmark_bar_folder_controller.mm?r1=76976&r2=76975&pathrev=76976
M http://src.chromium.org/viewvc/chrome/trunk/src/chrome/browser/ui/cocoa/bookmarks/bookmark_bar_folder_controller.h?r1=76976&r2=76975&pathrev=76976
[Mac] Apply CrTrackingArea to the two other places where we get |-mouseMoved:| zombie messages.
BUG=48709
TEST=Crash report numbers.
Review URL: http://codereview.chromium.org/6612025
------------------------------------------------------------------------
Mar 17, 2011
I think we can call this fixed! I see no instances of this crash on 696.12.
Status:
Fixed
Mar 17, 2011
Agreed, zombie occurance is way down. If it comes back, I'll just open a new bug, I think. Moving to Verified because I don't think there's anything to be done by QA.
Status:
Verified
Apr 7, 2011
Issue 78756 has this continuing to happen for BookmarkBarFolderController. I had pity and didn't pull you all over, so this is your notice. I've left it Mstone-X Pri-2 for now, until I get a better handle on the magnitude.
Oct 12, 2012
This issue has been closed for some time. No one will pay attention to new comments. If you are seeing this bug or have new data, please click New Issue to start a new bug.
Labels:
Restrict-AddIssueComment-Commit
Blocking: -chromium:47314 chromium:47314
Mar 10, 2013
(No comment was entered for this change.)
Labels:
-Area-UI Cr-UI
|
||||||||||
| ► Sign in to add a comment | |||||||||||
Owner: rse...@chromium.org
Labels: -Pri-2 Pri-1 Mstone-6