My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
  Advanced search   Search tips   Subscriptions
Issue 355: Clients hang if connecting to a server that's starting up
4 people starred this issue and may be notified of changes. Back to list
Status:  New
Owner:  ----

Sign in to add a comment
Reported by, Oct 26, 2010
What steps will reproduce the problem?
1. Start Redis with a large rdb or aof.
2. It will take several minutes to load the data from disk.
3. While it's still loading, connect to the server using any of the Redis client libraries (or redis-cli).

What is the expected output? What do you see instead?

I would expect the client to get "connection refused" within a few milliseconds.

Instead, the client's connect call blocks, waiting for the server to respond, until the server has finished loading its dataset.

This makes it harder to be resilient to Redis server downtime: instead of simply handling a client failing to connect, we have to set timeouts on connect, and this will slow down any components depending on the Redis connection.

What version of the product are you using? On what operating system?

Redis master (tag 2.2-alpha3), on Ubuntu Lucid.

Please provide any additional information below.

I think this is because the server calls 'listen' on its socket before it loads the data, but doesn't 'accept' until the aeMain event loop, which it only enters after loading the data.  It seems like it should be possible to call 'listen' just before entering the event loop, but after any time-consuming startup activities.

I assume it currently does it early because anetTcpServer calls 'socket', 'bind' and 'listen' all together, and the early 'bind' is useful to check the port is available.
Oct 26, 2010
The attached (quick and dirty proof of concept) patch makes Redis call 'listen' after loading the database, and it does have the behaviour I'd expect: during load Redis refuses all connections quickly, but once it's loaded it receives connections as normal.

This patch changes the semantics of anetTcpServer though, so I suspect it's not the right approach.
1.2 KB   View   Download
Oct 26, 2010
The behaviour is similar when the server is shutting down, and writing back to disk - clients connect and then block for ages.  Again, it would be better if they got an immediate "connection refused".

I can't see an obvious place to change the code here - I can't find where the server explicitly closes its socket, it might just let the OS do it since it's shutting down anyway.
Oct 27, 2010
I reported this and submitted a patch in February, which is available here

Discussion of the patch is here:
Oct 27, 2010
Thanks Dave - your patch is better than mine!

It seems like the main argument against accepting your patch was that for small datasets which take a few seconds to load, blocking is actually more useful than refusing connections.  That makes sense, but to support both scenarios, what about making this configurable at startup time?  (e.g. "refuse-connections-while-loading = true")
Oct 27, 2010
I'm happy however this is solved, I've been running my patched version in production for 8 months and I'd like to upgrade soon ;)
Sign in to add a comment

Powered by Google Project Hosting