Google Code Archive - Long-term storage for Google Code Project Hosting.

Posted on Apr 4, 2011 by Happy Monkey

Hi, we are having problems both in production and development environments relating to Redis server response times. If issue a simple command to get a key multiple times with the redis-cli client, I get very different response times. While most of the replies take 1ms or less, some take 6, 20, 30 and up to 450ms.

Over 5 milliseconds is way to much, as for example a web script might issue more than 100 requests to a Redis server to serve a single page. The client and server are the same machine, so no network latency is present. Tested on high end production servers with moderate use (500 clients) and a local server with about 10clients.

Tested redis servers: 2.2.2 and 2.0.1 Platform: Debian Linux 2.6.32-5-amd64

To reproduce:

watch -n1 "time redis-cli get ANY_key"

and check the varying response times, it doesn't seem to be related with background saves.

As a workaround, every time we query a redis server, we now use a very small time out (5ms) and retry in case of a timeout (e.g. 5 times). From the time we have put this into production we haven't got any timeouts failing over 5 times. However, if we just leave one time out to 200ms, we get multiple timeout errors per minute.

Any ideas what can be causing this? know issue?

At least some one else reports something similar: http://groups.google.com/group/redis-db/browse_thread/thread/b490bb7b57f7ba95

Comment #1

Posted on Apr 4, 2011 by Helpful Elephant

As a preliminary remark, Linux is not a real-time operating system. When synchronous IPCs are done between clients and servers, there is absolutely no guarantee that the kernel scheduler will enforce sub-millisecond latency for ALL roundtrips. On the contrary, the average latency may be quite good while the maximum latency is quite bad. This is not specific to Redis ...

Now, there are some factors that can make this situation even worse. For instance: - if you use Redis VM - if your machine swaps (at the OS level) - if CPU consumption is significant - if all your 500 clients send queries to Redis at the same exact time.

Some remarks:

a web script might issue more than 100 requests to a Redis server

Your script should not perform 100 synchronous accesses to Redis, but rather pipeline the queries. You would probably get much better response times this way.

watch -n1 "time redis-cli get ANY_key"

It seems a poor way to measure latency: the cost of forking and launching redis-cli is probably higher than the roundtrip you are trying to measure.

Regards, Didier.

Comment #2

Posted on Apr 4, 2011 by Happy Monkey

Thanks for the reply Didier, I understand this is not a perfect benchmark, but wanted to report this issue for other people.

One thing though, I run the same test on mysql and get a much smoother RT, between 10 and 15ms over the network. The max I have seen was 22ms for the same "Select time();" query. While on redis, response time varies greatly.

We have VM enabled, but the info client reports it's not being used, CPU utilization is below 10%, we pipeline queries to redis for secuential parts of the code, but different parts can issue their own.

What do you think of the time out-retry estrategy? Having timeouts at 5ms and retrying 5 times, seems to be much better than having 200ms as a timeout. We use such small timeouts, as a Redis server might be down, and we don't want requests to be waiting for it (when we use it a caching system). We also had the case of a "dead" server, that kept the Redis port open, but didn't replied to any queries until ir was restarted, and each connection waited until the max time out.

Thanks for your comments

Comment #3

Posted on Apr 4, 2011 by Helpful Elephant

Hi again,

sorry, I can only offer speculations (to be taken with a grain of salt).

MySQL spawns one thread per connection. With queries such as "select time()", there is no contention since no real data is accessed. The threads are therefore very responsive and because they are distributed on all the cores, response time variation is limited.

Redis, on the other hand, runs in one thread. All the queries are serialized. So if you have 10 queries whose processing time is 1 ms, response time for the first query will be around 1 ms, but response time for the last one will be at least 10 ms.

When there is no contention, I would say a single-threaded server tends to generate more a volatile response time than a one-thread-per-connection server.

We have VM enabled, but the info client reports it's not being used

If it is not used, why not trying to start Redis with no VM at all? At least you will validate whether the VM has an impact or not.

What do you think of the time out-retry estrategy?

I'm rather surprised you have good results with a 5 ms timeout. It is really close to the typical kernel scheduler time slice. I'm probably too old and too conservative, but I never use communication timeouts below 2 seconds on my Unix/Linux systems.

If you suspect scheduling issues, you may want to try to isolate Redis on its own core (taskset, numactl), or run Redis with a real-time priority under SCHED_RR or SCHED_FIFO (chrt). Be sure to deactivate bgsave before trying this.

Regards, Didier.

redis - issue #510

Comment #1

Comment #2

Comment #3