Although I don't know if this is a contributor to issue 193, it sure seems like a prerequisite.
Comment #1
Posted on Apr 13, 2013 by Massive RhinoComparing http://www.opensource.apple.com/source/Libc/Libc-391.2.10/include/malloc/malloc.h and http://www.opensource.apple.com/source/Libc/Libc-763.12/include/malloc/malloc.h , we should be able to use the _leopard_malloc_zone for 10.4 as well.
Comment #2
Posted on Apr 13, 2013 by Massive Rhino(No comment was entered for this change.)
Comment #3
Posted on Apr 15, 2013 by Massive RhinoTobias, how are you enabling mozalloc? The changesets show your change to jemalloc.c, but that doesn't even get compiled here, and I can't see where you're twiddling MOZ_MEMORY (unless you're changing it through some other means).
Comment #4
Posted on Apr 15, 2013 by Massive Rhino(and what about https://bugzilla.mozilla.org/show_bug.cgi?id=702250 ?)
Comment #5
Posted on Apr 15, 2013 by Massive RhinoFixed MOZ_MEMORY and short-circuited memalign which doesn't exist on 10.4 (instead it uses the built-in ipalloc). Confirmed osx_use_jemalloc is true by writing in if (osx_use_jemalloc) asm("trap\n") and verifying JS traps there (it does). V8 passes with jemalloc. Doesn't make a lot of difference there, but we'll build the full browser and then see if I can trigger M702250.
Comment #6
Posted on Apr 16, 2013 by Massive RhinoAlso had to make a tweak in storage/ because 10.4's zones don't have malloc_good_zone in them (changed it to simply use 16-byte-aligned stores).
Incredibly the browser not only works, but is (especially for DEBUG) very quick. Many of the sites in issue 193 are better, implying that resource starvation may be part of the reason for all those threads sitting around in wait.
However, drag and drop (thus can't test M702250), cut and paste, and printing are all messed up. Printing may be revenge of issue 82. I'm trying to figure out what's wrong with the other two. Running a non-jemalloc build tonight to try to exonerate it as the cause. I suspected M647216 but trying to work around it just made things worse.
Comment #7
Posted on Apr 16, 2013 by Massive RhinoPrinting appears to be issue 82, but cut and paste and drag and drop all work in a non-jemalloc build. Even if osx_use_jemalloc is set to false (using the environment variable NO_MAC_JEMALLOC), it does not work. I'm suspicious alignment is to blame, but IonMonkey has higher priority right now. Dropping milestone, though we do want this.
Comment #8
Posted on Apr 16, 2013 by Massive Rhinohttps://bugzilla.mozilla.org/show_bug.cgi?id=691003 makes me think our minimum size should be 16 bytes, always, even for tiny allocations.
Comment #9
Posted on Apr 16, 2013 by Massive Rhino... i.e. change #define TINY_MIN_2POW (sizeof(void*) == 8 ? 3 : 2) to 4 (forcing 2^4 minimum allocations).
Comment #10
Posted on Apr 18, 2013 by Massive RhinoThat didn't fix it.
Comment #11
Posted on Apr 19, 2013 by Happy RabbitI was able to proove that AuroraFox uses jemalloc by setting breakpoints in some of the functions of jemalloc. The patches in the Changesets are exactly what is needed to get it working (actually it wasn't correctly enabled for some versions but that was fixed) - BUT at least at one moment I could reproducibly crash the browser using drag and drop, however it occurs very rarely.
Comment #12
Posted on Apr 19, 2013 by Massive RhinoBut how did you set MOZ_MEMORY to 1? It's not in your Aurora changesets.
Comment #13
Posted on Apr 21, 2013 by Happy RabbitAdding "--enable-jemalloc" to the configure command line should override the default.
Comment #14
Posted on Apr 21, 2013 by Happy RabbitSee http://hg.mozilla.org/mozilla-central/file/4420f27742c7/configure.in#l7112
Comment #15
Posted on Apr 22, 2013 by Massive RhinoOK.
Looking through jemalloc.c, it appears that the original szone is still able to be called for certain things. Maybe that's what we need to add.
Comment #16
Posted on Jul 15, 2013 by Massive RhinoI'm thinking we have to fake up a memalign for the original zone, which 10.4 does not have. Something like this might work:
void *aligned_malloc( size_t size, int align ) { void mem = (szone->malloc)( size + (align-1) + sizeof(void) );
char *amem = ((char*)mem) + sizeof(void*);
amem += align - ((uintptr)amem & (align - 1));
/* ((void**)amem)[-1] = mem; // for an aligned free */
return amem;
}
The trick is making sure we don't leak for alignments != 16. I'm throwing a trap into jemalloc and we'll see if it bombs at memalign. If so, I bet that's our problem.
Comment #17
Posted on Jul 18, 2013 by Massive RhinoStatus update:
- Typing this in a jemalloc debug build of "22.1." Browser speed did increase in the way expected, as it did before, but note I am doing this with issue 231 also applied. I need to see if I can get rid of that.
- I tinkered with the jemalloc patch for 10.4 and simply made memalign a no-op with a trap if osx_use_jemalloc id false, but the trap isn't getting triggered, so it may never have been called in the first place.
- I'm not sure what changed in widget between my last cut and 22, but cut and paste and drag to the browser work now. However, dragging things out of the browser causes M702250. That's bad. OTOH, it appears that it may be possible to get around it with something along the lines of M703087 (https://bug703087.bugzilla.mozilla.org/attachment.cgi?id=575304) which may not be good, but may make a hopefully infrequent operation possible. Or, we just disable drag and drop of images from the browser (copying or right-click-and-save do work).
Comment #18
Posted on Jul 18, 2013 by Massive RhinoLarge allocations are also very slow. On www.butterflylabs.com,
(gdb) bt 15
0 0xffff9080 in ___memset_pattern () at /System/Library/Frameworks/System.framework/PrivateHeaders/ppc/cpu_capabilities.h:193
1 0x901296d0 in memset ()
2 0x0002da34 in ozone_free (zone=0x3e100, ptr=0x609e00) at /Volumes/BruceDeuce/src/mozilla-22.1/memory/mozjemalloc/jemalloc.c:6612
3 0x08da88bc in _cairo_image_surface_finish (abstract_surface=0x31a1fbc0) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-image-surface.c:734
4 0x08dca7c0 in _moz_cairo_surface_finish (surface=0x31a1fbc0) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:728
5 0x08dca910 in _moz_cairo_surface_destroy (surface=0x31a1fbc0) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:649
6 0x08ddb73c in _cairo_quartz_surface_finish (abstract_surface=0x338e83a0) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-quartz-surface.c:2053
7 0x08dca7c0 in _moz_cairo_surface_finish (surface=0x338e83a0) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:728
8 0x08dca910 in _moz_cairo_surface_destroy (surface=0x338e83a0) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-surface.c:649
9 0x08dbc2bc in _moz_cairo_pattern_destroy (pattern=0x321b5f40) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-pattern.c:346
10 0x08da3a68 in _cairo_gstate_fini (gstate=0x1ccf996c) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-gstate.c:229
11 0x08da3cec in _cairo_gstate_restore (gstate=, freelist=0x1ccf9ab8) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo-gstate.c:290
12 0x08d8fe94 in _moz_cairo_restore (cr=0x1ccf9800) at /Volumes/BruceDeuce/src/mozilla-22.1/gfx/cairo/cairo/src/cairo.c:599
13 0x06994e60 in mozilla::FrameLayerBuilder::DrawThebesLayer (aLayer=0x31160400, aContext=0x321b9700, aRegionToDraw=, aRegionToInvalidate=@0xefffae6c, aCallbackData=0xefffc928) at /Volumes/BruceDeuce/src/mozilla-22.1/layout/base/FrameLayerBuilder.cpp:3334
14 0x08c6ff30 in mozilla::layers::BasicShadowableThebesLayer::PaintBuffer (this=0x31160400, aContext=, aRegionToDraw=@0xefffae3c, aExtendedRegionToDraw=@0xefffadd8, aRegionToInvalidate=@0xefffae6c, aDidSelfCopy=false, aCallback=, aCallbackData=0xefffc928) at basic/BasicThebesLayer.h:95
(More stack frames follow...)
I suspect we should turn MALLOC_FILL off, or only enable it for debugging.
Comment #19
Posted on Jul 18, 2013 by Massive RhinoDisabling MALLOC_FILL fixes that problem.
Comment #20
Posted on Jul 18, 2013 by Massive RhinoDragging out of the browser does work for text, just not images.
Comment #21
Posted on Jul 18, 2013 by Massive RhinoAfter playing around in jemalloc for awhile it appears that there is no good way to work around the image drag bug, so it will have to be wallpapered.
In content/base/src/nsContentAreaDragDrop.cpp, disabling creating the native image draggable still crashed.
Disabling creating the native image draggable and disabling creating the file promise also still crashed.
So, I stopped those from even being made by just returning NS_OK when a drag gesture is detected over an image. Now it no longer crashes, but other drags like URLs and so on still work.
Nicely still, on all the images in M702250 copying them to the clipboard still worked, as well as saving them to disk, just not dragging them. I consider this a shippable compromise.
Comment #22
Posted on Jul 20, 2013 by Massive RhinoNew crash in jemalloc: the Applications tab in Preferences crashes the browser with a null pointer. I need to build a DEBUG for a better stack trace, but here's the first part of it in an opt build. malloc* in the backtrace is scary.
Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x000254e4 in receive_samples () (gdb) bt
0 0x000254e4 in receive_samples ()
1 0x90050bbc in malloc_zone_from_ptr ()
2 0x90050bbc in malloc_zone_from_ptr ()
3 0x937f4310 in -[NSBitmapImageRep _freeImage] ()
4 0x937f3f50 in -[NSBitmapImageRep dealloc] ()
5 0x937f3c40 in -[NSImage _freeRepresentation:] ()
6 0x937f3a68 in -[NSImage dealloc] ()
7 0x92bd2c68 in NSPopAutoreleasePool ()
Comment #23
Posted on Jul 20, 2013 by Massive RhinoSeeing that it involves discarding images, maybe it has to do with the icons. We can dispense with those if we have to, but we can't disable the Applications tab functionality entirely.
Comment #24
Posted on Jul 20, 2013 by Massive RhinoIt's actually part of a general problem with icons -- if you pull up Downloads and start scrolling, it will crash as icons get freed. It's the same underlying bug as M702250 again, but it's just really bad and consistent on 10.4.
The current wallpaper is to make nsIconChannelCocoa.mm return NS_ERROR_FAILURE when queried for an icon; icons appear too frequently to force them to leak, or memory usage gets bad. This is okay for a beta, it's not okay for release. However, the way around that is to make a new icon service that hands prefab icons over from chrome (by just redirecting through ioService). That's pretty easy to write, just a little tedious, and we'd have to add a decent library of icons at 16x16 and 32x32 sizes.
Comment #25
Posted on Jul 20, 2013 by Massive Rhino(by icons I mean icons from OS X -- icons internal to TenFourFox, such as favicons and chrome, work fine)
Comment #26
Posted on Jul 21, 2013 by Massive RhinoNew showstopper: uploads work inconsistently. I'm able to upload to ADN, but not to Google Code. Google Code works fine in 22.0.
I'm getting concerned there are all kinds of bugs in here like that.
Comment #27
Posted on Jul 21, 2013 by Massive RhinoI can't reproduce this reliably anymore. I give up. Let's see what the punters think.
Comment #28
Posted on Jul 21, 2013 by Massive RhinoViewing isolated images is slow to close. I bet the background texture is the issue. Never liked it anyway, so we could jettison that.
Comment #29
Posted on Jul 26, 2013 by Massive RhinoWe've got a serious leak somewhere. It manifests in the base browser too after a couple days of uptime, but is nowhere near as severe, and the browser grinds to a halt in GC. The base non-jemalloc build does this slightly, but it still runs fine. This unfortunately is a show stopper.
Comment #30
Posted on Oct 29, 2013 by Massive RhinoTesting 24 shows this no longer pays off, and the browser's memory use still skyrockets. Dropping priority.
Status: Started
Labels:
Type-Enhancement
Priority-Low