Posted on Jun 8, 2011 by Grumpy Hippo

When compiling Serf and APR with pool debugging enabled, errors raised by the response handler seem to cause Serf to SEGFAULT during pool cleanup. Debugging shows the location of the fault to be here:

{{{ (gdb) run http://localhost/abc Starting program: /home/cmpilato/tmp/subversion/serf/test/serf_get http://localhost/abc [Thread debugging using libthread_db enabled] Error running context: (1) Operation not permitted

Program received signal SIGSEGV, Segmentation fault. 0x08066874 in allocator_free (allocator=0x41414141, node=0x8088e90) at memory/unix/apr_pools.c:364 364 if (allocator->mutex) (gdb) where

0 0x08066874 in allocator_free (allocator=0x41414141, node=0x8088e90)

at memory/unix/apr_pools.c:364

1 0x080669f3 in apr_allocator_free (allocator=0x41414141, node=0x8088e90)

at memory/unix/apr_pools.c:439

2 0x0804fb32 in serf_bucket_mem_free (allocator=0x8086b18, block=0x8088eb0)

at buckets/allocator.c:265

3 0x0804d675 in serf_default_destroy_and_data (bucket=0x8087590)

at buckets/buckets.c:120

4 0x0804e8c2 in serf_response_destroy_and_data (bucket=0x8087590)

at buckets/response_buckets.c:94

5 0x080538dc in destroy_request (request=0x80806c0) at outgoing.c:306

6 0x080539fa in cancel_request (request=0x80806c0, list=0x807e568,

notify_request=0) at outgoing.c:350

7 0x08054ebf in serf_request_cancel (request=0x80806c0) at outgoing.c:1268

8 0x08054b3a in serf_connection_close (conn=0x807e508) at outgoing.c:1113

9 0x080533fd in clean_conn (data=0x807e508) at outgoing.c:56

10 0x08067be3 in run_cleanups (cref=0x807d4e0)

at memory/unix/apr_pools.c:2346

11 0x08066d76 in pool_clear_debug (pool=0x807d4d0,

file_line=0x8077630 &quot;test/serf_get.c:436&quot;)
at memory/unix/apr_pools.c:1547

12 0x08066f40 in pool_destroy_debug (pool=0x807d4d0,

file_line=0x8077630 &quot;test/serf_get.c:436&quot;)
at memory/unix/apr_pools.c:1632

13 0x08066ff8 in apr_pool_destroy_debug (pool=0x807d4d0,

file_line=0x8077630 &quot;test/serf_get.c:436&quot;)
at memory/unix/apr_pools.c:1674

14 0x0804cb0f in main (argc=2, argv=0xbffff854) at test/serf_get.c:436

}}}

Note that in lowest frames, "allocator" is set to APR's pool debugging junk variable (0x41 repeated for each byte).

This was originally noticed in the Subversion project, tracked in http://subversion.tigris.org/issues/show_bug.cgi?id=3875.

In attempting to generalize the problem to just a Serf scenario, I found what I hope is a helpful reproduction recipe. Of course, begin by compiling APR with --enable-pool-debug. Then:

Change test/serf_get.c:handle_response() to "return 1;" immediate inside the while (1) loop.
'make' (to recompile the serf_get utility).
Run 'test/serf_get http://google.com'

{{{ $ test/serf_get http://google.com Error running context: (1) Operation not permitted Segmentation fault $ }}}

Comment #1

Posted on Jun 8, 2011 by Swift Wombat

FYI, This might be fixed by one of the patches we have for serf in mod_pagespeed (I plan to cleanup and submit them at some point...):

--- src/third_party/serf/src/buckets/allocator.c 2011-05-06 16:22:56.220366814 -0400 +++ src/third_party/serf/src/buckets/allocator.c 2011-05-06 11:08:31.684199000 -0400 @@ -83,6 +83,7 @@ struct serf_bucket_alloc_t { apr_pool_t *pool; apr_allocator_t *allocator; + int own_allocator;

 serf_unfreed_func_t unfreed;
 void *unfreed_baton;

@@ -106,6 +107,9 @@ if (allocator->blocks) { apr_allocator_free(allocator->allocator, allocator->blocks); } + if (allocator->own_allocator == 1) { + apr_allocator_destroy(allocator->allocator); + }

 return APR_SUCCESS;

} @@ -119,10 +123,12 @@

 allocator->pool = pool;
 allocator->allocator = apr_pool_allocator_get(pool);

+ allocator->own_allocator = 0; if (allocator->allocator == NULL) { /* This most likely means pools are running in debug mode, create our * own allocator to deal with memory ourselves */ apr_allocator_create(&allocator->allocator); + allocator->own_allocator = 1; } allocator->unfreed = unfreed; allocator->unfreed_baton = unfreed_baton;

Comment #2

Posted on Jun 9, 2011 by Grumpy Hippo

My debugging was leading me down the same road -- to the custom handling of the allocator when pool debugging was enabled. I'll give your patch a try.

Comment #3

Posted on Jun 9, 2011 by Grumpy Hippo

No effect at all. And actually, now that I read the patch, I'm not surprised. I think the SEGFAULT is due to a doubled attempt to cleanup the allocator, not a missing one. This patch seems to add yet another destruction attempt on the allocator.

Comment #4

Posted on Jun 9, 2011 by Swift Wombat

You are right; I misremembered the patch rationale: it was meant to be a leak fix. Please pardon my screw up.

I think the issue may be that the allocator gets cleaned up before the bucket/requests...

Comment #5

Posted on Jun 16, 2011 by Swift Lion

r1488 fixes a segfault on pool cleanup, but with a slightly different stacktrace. The actual problem is that the response bucket is first destroyed when the respool allocator is cleaned up, and then again on cancel_request when the connection pool is cleaned up (respool is a child of conn->pool so gets cleaned up first).

This root cause can very well cause the stacktrace in this issue, but on Windows, in my simulations I always got the one from http://subversion.tigris.org/issues/show_bug.cgi?id=3917

Closing as "Fixed", let us know of the problems persist.

Thanks for the detailed report!

Comment #6

Posted on Jun 16, 2011 by Grumpy Hippo

I can confirm that this seems to fix the problem.

Comment #7

Posted on Jun 21, 2011 by Grumpy Hippo

By the way, I've filed issue 79 to track the patch previously posted to this issue (or, my modified version thereof) so it doesn't get lost here.

Code

Archive

serf - issue #76