incr/decr operations are not thread safe.
- What steps will reproduce the problem? Please use this code to reproduce the problem. http://gist.github.com/348889
$ memcached -t 16
$ ./test_ascii expected: 100000 result: 99939
Change this line to use binary protocol and test again. bool g_binary = true;
$ ./test_binary expected: 100000 result: 99927
What version of the product are you using? On what operating system? memcached 1.4.4 Red Hat Enterprise Linux 5.4 x86_64
Please provide any additional information below. Here is a sample patch to fix the problem.
* memcached.c_org 2009-11-27 14:45:13.000000000 +0900 --- memcached.c 2010-03-31 01:29:22.000000000 +0900
* 54,59 ** --- 54,60 ---- #endif #endif
- pthread_mutex_t incr_lock = PTHREAD_MUTEX_INITIALIZER;
/*
- forward declarations
*/
- forward declarations
*/
* 1017,1022 ** --- 1018,1024 ---- req->message.body.expiration); }
- pthread_mutex_lock(&incr_lock);
it = item_get(key, nkey);
if (it && (c->binary_header.request.cas == 0 ||
c->binary_header.request.cas == ITEM_get_cas(it))) {
* 1082,1087 ** --- 1084,1090 ----
write_bin_error(c, PROTOCOL_BINARY_RESPONSE_KEY_ENOENT, 0);
}
+ pthread_mutex_unlock(&incr_lock); }
static void complete_update_bin(conn *c) {
* 2780,2785 ** --- 2783,2789 ---- return; }
- pthread_mutex_lock(&incr_lock);
it = item_get(key, nkey);
if (!it) {
pthread_mutex_lock(&c->thread->stats.mutex);
* 2791,2796 ** --- 2795,2801 ---- pthread_mutex_unlock(&c->thread->stats.mutex);
out_string(c, "NOT_FOUND");
+ pthread_mutex_unlock(&incr_lock); return; }
* 2806,2811 ** --- 2811,2817 ---- break; } item_remove(it); /* release our reference */ + pthread_mutex_unlock(&incr_lock); }
/*
Thanks.
Comment #1
Posted on Apr 1, 2010 by Massive HorseIt seems to me the correct fix is to eliminate inplace incr/decr, rather than new locks (it possibly damages to scalability).
Also see the report, http://bit.ly/9Dj9gI
In the engine branch, this matter was already fixed.
http://github.com/trondn/memcached/commit/8e59e16a94fcf5f1e5fe1271a71a7958357853f8
I think it should be backported.
Comment #2
Posted on Apr 2, 2010 by Helpful Birddustin and I talked about this briefly last night. That looks like the fix, but it's likely to be a performance regression (if slight).
I'll verify the report with the fix a bit later and will see about qualifying the performance change in the future. Not sure what else can be realistically done here easily.
We'd have to end up handling counters more specifically in the send code or something... like the way we send back suffixes. So again it's worth testing to see if it actually does regress before doing more code.
Comment #3
Posted on Apr 3, 2010 by Helpful BirdI just tried backporting the patch but I failed at it and it caused a 7% performance regression in running that test alone.
So we're going to punt on this for 1.4.5 and try harder for 1.4.6. Sorry :/ Thanks for the test, it's very helpful.
Comment #4
Posted on Apr 3, 2010 by Helpful BirdErr, I figured it out, but it's still slower and not entirely correct so we'll have to figure it more later.
do_add_delta has the global cache_lock, so it should've never had a race condition? But the item that's passed into it is fetched from outside of the big lock. So an incremented item could end up being junked by another thread. Added a little bit to trond's patch to re-fetch the item once inside do_add_delta and the test came back 100% every time. It's still 7% slower, and I'm not sure if trond had that fixed elsewhere in the engine branch. I couldn't find it on a cursory look.
To fix it more, do_add_delta needs to be passed in a key,keylen tuple instead of an item and return a NOT_FOUND if the internal fetch doesn't find the item. But again we'll revisit it later :/
Comment #5
Posted on Apr 4, 2010 by Helpful BirdI'll have to follow this up with another patch as mine's still incomplete, but I had a thought...
When items are being sent back to a client, they have their reference counters incr/decr'ed. If we move the item_get down into do_add_delta (and use do_item_get), what we can do is then also test to see if the refcount > 0 on the item. If it is, another thread is copying the data to the client or modifying it in some way. So then we allocate a new item and do the replace.
But if recount == 0, we can modify the item in place. This should be safe since item_get and add_delta operate under the same lock.
Bonus points would be optimizing counters to use thread-local write buffers on return. I can think of mildly complicated things to shorten the amount of time a counter is reflocked and decrease the number of expensive re-allocations:
- Use thread-local write buffers for returning counters. Flatly allocated to INCR_MAX_STORAGE_LEN.
- In a counter, store a uint64_t into the first 8 bytes of the value, and the string of the value in free bytes after that.
- in do_add_delta, remove the safe_strtoull call and simply do the math
- do the refcount check described above
- then flatten the value back out via snprintf into the end of the value.
- when returning a counter, we'd have to use some bit somewhere to determine that it's a counter (haven't looked at the item struct for ideas yet)
- then memcpy the non-uint64_t of the counter into the send buffer for the client.
- only hold the refclock on the counter in the client for the time it takes to fetch the item, determine it's a counter, toss a buffer into the return chain, and memcpy into it.
That should greatly reduce the amount of time counter items stay locked. Also the amount of time we hold the global lock while processing incr/decr.
Extra bonus points for doing the incr-or-decr detection outside of do_add_delta and passing in a positive or negative value to be added to the counter.
That's on the memory-for-cpu tradeoff edge. we could also test just storing 8 bytes into counter values, doing most of the above, and calling snprintf from the value into the client buffer on every return.
Low priority compared to getting the engine work finished, but worth testing I think :) In the shortterm I have a good feeling that the refcount trick will restore it to roughly the same speed it was before.
Comment #6
Posted on Sep 9, 2010 by Happy PandaHey everyone, any update on this issue after 5 months? Is the increment/decrement family (with_initial, by_key...) thread safe now?
Comment #7
Posted on Jul 12, 2011 by Helpful BirdThis has finally been fixed and merged into 1.4.6-rc1, please reopen if it's broken.
1.6 has it fixed via another approach, but I'm not sure if that work is completed as of now.
Comment #8
Posted on Jul 13, 2011 by Helpful Birdtrying to close this again.
Status: Fixed
Labels:
Type-Defect
Priority-Medium