My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
  Advanced search   Search tips   Subscriptions
Issue 525: Slow clients using pubsub cause memory to grow unbounded
5 people starred this issue and may be notified of changes. Back to list
Status:  New
Owner:  ----

Sign in to add a comment
Reported by, Apr 17, 2011
What version of Redis you are using, in what kind of Operating System?

2.0.4 on Linux (Fedora 14)

What is the problem you are experiencing?

The server process memory usage grows without bounds.  (May cause unintentional denial of service.)

What steps will reproduce the problem?

A client that is SUBSCRIBEd to a high volume will cause the server to buffer all pending PUBLISHed data without limit, when the client is too slow to keep up, or if the client has hung (blocked).  This can cause a single faulty client to kill a redis server.  (This is also possible when a blackholing iptable type filter is raised between server and client, or when an intermediate stateful firewall decides to be buggy.)

Do you have an INFO output? Please past it here.

redis> info

If it is a crash, can you please paste the stack trace that you can find in the log file or on standard output? This is really useful for us!

Please provide any additional information below.

Could you add a configuration feature to redis.conf for setting the maximum buffer space to use for client subscriptions?  This would be analogous to the Spread toolkit MaxSessionMessages (defaults to 1000, but I generally raise it to 5,000 or 10,000), or the  activemq.maximumPendingMessageLimit parameter of ActiveMQ for dealing with too-slow consumers.  The best strategy for the server is to drop the connection to a slow client after it has backlogged this configured maximum buffer size.  Then the memory impact on the server is limited to a fixed size per client connection.

This should be a relatively small change to the pubsubPublishMessage function in src/pubsub.c  Before calling addReply() and addBulk(), the client connection should be checked to see if too much data is buffered up.  If so, then drop the connection.

More generically, the fix could even go into src/networking.c into the _addReplyObjectToList() function, done only in the least-likely branch where there is already a list of objects, and we can't append to the last item in the list.  So perhaps in the unstable branch, line 127 would be a good spot to check the list size (in terms of bytes) before adding another object to the tail of the list.  That would have zero CPU cost for clients that read fast enough that the static buffer never spills over into the linked list, and would be a generic place to implement the bounds-checking.

I'm not very familiar with the code for redis, though.  So, you might have a better way to solve this!


Sep 8, 2011
I think this would be a very useful feature. I'd be willing to work on it, if it's considered an acceptable addition.
Sep 8, 2011
I'd definitely appreciate it!
Dec 21, 2011
I've added a new feature that lets you specify a limit on the number of pending objects in the client's "reply" linked list (pending outbound messages).  It adds a new server level config parameter "maxclientqueue", with a default of 0 (disabled).

When this setting is enabled and a client is too slow to read replies fast enough to keep up with the server, then the server will drop the client's connection when it has built up the maximum number of messages on the server side.  It's not perfect, because ideally you would want to limit the client based on the number of bytes that have been queued up.  But this setting is intended to be a safety-measure (like a shearing-pin).  Actually counting up the enqueued bytes and maintaining that value seems like overkill when we already have an object count in the linked list structure.

When a client exceeds this limit, the server also logs a warning message to indicate that it has dropped a client due to this overflow.  That provides a positive signal if this setting is used and is set too low, or if clients are frequently too slow to keep up.  (There is also a low frequency check (every ~30 sec or so) for any clients that have overflowed, but haven't yet been dropped due to another write, as a cleanup pass.)

My github branch for this patch is here:

And the diff is attached as a patch as well.

6.9 KB   View   Download
Dec 21, 2011
Thanks for doing this!

I think your solution is very reasonable, and probably the simplest
out of the alternatives.

Sign in to add a comment

Powered by Google Project Hosting