What steps will reproduce the problem? 1. Issue a get request that is extremely large and is delivered in multiple tcp packets 2. 3.
What is the expected output? What do you see instead? The expected output is that of a normal (multi) get request. Instead the server closes the connection when epoll gives a 0 result.
What version of the product are you using? On what operating system? 1.4.10 / Ubuntu Lucid 2.6 kernel
Please provide any additional information below. You can see from this strace output (https://gist.github.com/3950bdee3a984a1af2dd) that the server reads increasingly large chunks of the request and then closes the connection when epoll gives a 0 result. From what we can see from the source code it looks like it does this 4 times in increasingly large chunks and then just stops reading. (https://github.com/memcached/memcached/commit/75cc83685e103bc8ba380a57468c8f04413033f9#L0R3233)
Comment #1
Posted on Dec 29, 2011 by Helpful BirdThis is an ascii connection (looks like so from the strace)? Exactly how large of a multiget is this?
That change was only supposed to nuke the connection if you were flooding the conn with junk data. Multigets would/should've started parsing at some point.
If not, I guess that's a bug.
Comment #2
Posted on Dec 29, 2011 by Happy HorseYes ascii. Not using binary protocol. Using a tcp socket.
Here's a sample get that triggers this: https://gist.github.com/f764d48251ceda134f28
Comment #3
Posted on Dec 30, 2011 by Happy HorseIf I'm understanding things correctly, one thing that might "help" is increasing the initial buffer size. Also, since the request is terminated with \r\n, it seems like it should read until \r\n, or until some other user configurable threshold. (To prevent someone from sending an endless stream of junk data.)
Comment #4
Posted on Dec 30, 2011 by Helpful BirdI was under the impression it only booted you if the connection wasn't trying to do a multiget.. so I'll have to go test it or wait for trond to see this and respond himself.
I'll need a few days before being able to test it though. Thanks for your report
Comment #5
Posted on Dec 30, 2011 by Happy HorseWe've only seen it when doing a multiget. We are happy to help test in any way we can. Unfortunately this is causing lots of errors for us in production. (In the interim are attempting to work around them now that we've identified this as the cause.)
Thanks again!
Comment #6
Posted on Dec 30, 2011 by Helpful Birdthere aren't any other massive commands which happen sans a \r\n :)
that must be a pretty huge multiget though. usually folks have a few servers and the command is split up.
It shouldn't be too long here... I've slacked a week on 1.4.11 because the holidays call to me, but that fixes some other bugs and I need to wrap it up first.
Comment #7
Posted on Jan 11, 2012 by Helpful BirdI can't reproduce this, even with a 500,000 key multiget or your provided one. It'll work (but a little slow in the 500k case). It's not disconnecting me.
Can you provide a script that reproduces your error? Along with details of exact client versions of all included utilities. Ideally if you start a fresh memcached instance, the script will fill and fetch all the necessary data before causing the disconnection.
Thanks!
Comment #8
Posted on Jan 25, 2012 by Helpful Birdping? Anyone reading this ticket?
I wasn't able to reproduce the server early close from your test input or from mc-crusher's 500,000 key multiget. Do you have more information?
Comment #9
Posted on Jan 25, 2012 by Happy HorseI just took a quick stab at reproducing it and I couldn't either... which leaves me pretty confused. The traces before were easily repeatable ... we did it a few times before submitting. When I have a few minutes I'll give it another go. Between now and then I'm thinking it might be a bug in Ruby / Rails... although there's nothing obvious.
Comment #10
Posted on Jan 25, 2012 by Helpful BirdOk, I'll leave the bug open for a few more days just in case, but please let us know!
Comment #11
Posted on Feb 1, 2012 by Helpful BirdAny update? :)
Comment #12
Posted on Jul 14, 2012 by Helpful BirdCould never reproduce this. mc-crusher would run with 500,000 key multigets just fine (though that did point out an issue in the ascii parser...)
Status: Invalid
Labels:
Type-Defect
Priority-Medium