Export to GitHub

redis - issue #207

Replication of MSET with binary values breaks replication


Posted on Mar 25, 2010 by Helpful Lion

What steps will reproduce the problem?

1) Fire up a "master" instance of redis on port 6379 (stock config) 2) Fire up a "slave" instance of redis on port 6380 (stock config,but as a slave of the first)

3) FLUSHALL and BGREWRITEAOF both the master and slave to baseline all data

../data/master: total 16K drwxr-xr-x 2 nobody nobody 4.0K 2010-03-25 15:54 . drwxr-xr-x 4 nobody nobody 4.0K 2009-12-16 15:19 .. -rw-r--r-- 1 nobody nobody 0 2010-03-25 15:54 appendonly.aof -rw-r--r-- 1 nobody nobody 10 2010-03-25 15:54 data.rdb -rw-r--r-- 1 nobody nobody 6 2010-03-25 15:27 master.pid

../data/slave: total 16K drwxr-xr-x 2 nobody nobody 4.0K 2010-03-25 15:54 . drwxr-xr-x 4 nobody nobody 4.0K 2009-12-16 15:19 .. -rw-r--r-- 1 nobody nobody 0 2010-03-25 15:54 appendonly.aof -rw-r--r-- 1 nobody nobody 10 2010-03-25 15:54 data.rdb -rw-r--r-- 1 nobody nobody 6 2010-03-25 15:14 slave.pid

4) Download the run.php file and Predis_5.2.php file into a directory, and run using PHP 5.2 or higher. Be sure to replace the connection information for the master and slave for your environment.

5) Execute "php run.php"

--

Note the data directories afterwards:

../data/master: total 28K drwxr-xr-x 2 nobody nobody 4.0K 2010-03-25 15:55 . drwxr-xr-x 4 nobody nobody 4.0K 2009-12-16 15:19 .. -rw-r--r-- 1 nobody nobody 12K 2010-03-25 15:55 appendonly.aof -rw-r--r-- 1 nobody nobody 10 2010-03-25 15:55 data.rdb -rw-r--r-- 1 nobody nobody 6 2010-03-25 15:27 master.pid

../data/slave: total 24K drwxr-xr-x 2 nobody nobody 4.0K 2010-03-25 15:55 . drwxr-xr-x 4 nobody nobody 4.0K 2009-12-16 15:19 .. -rw-r--r-- 1 nobody nobody 4.4K 2010-03-25 15:55 appendonly.aof -rw-r--r-- 1 nobody nobody 10 2010-03-25 15:55 data.rdb -rw-r--r-- 1 nobody nobody 6 2010-03-25 15:14 slave.pid

• The master has 12k of data, but the slave only has 4.4K • The attached aof_files.diff shows that everything from the start of the MSET and beyond is missing from the slave

--

Confirmed with redis 1.2.5 on:

• OS X 10.5/Intel:

[antbox:~] mhughes% file /usr/local/redis/bin/redis-server /usr/local/redis/bin/redis-server: Mach-O executable i386

 • Fedora Core 10/x86_64

[mhughes@sandbox64 ~]$ file /usr/local/redis/bin/1.2.5/redis-server /usr/local/redis/bin/1.2.5/redis-server: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped

• CentOS 5.4/x86_64

[root@cache1 ~]# file /usr/local/redis/bin/1.2.5/redis-server /usr/local/redis/bin/1.2.5/redis-server: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped


In our production environment, I have binary data (usually gzip'd serialized data), and have found that my slave, past a "SLAVEOF cache1 6379" does not stay in sync, and after the SLAVEOF does a resync, only sporadic commands from the master execute on the slave (that is to say, seeing commands run by watching MONITOR on the slave).

I used tcpdump to watch a slave pulling data from its master, and it seems that the communication from the master to the slave is using the binary protocol, and I can only assume that the binary protocol cannot properly account for the binary data that is being sent across. I can't attach the tcpdump results themselves, but I can try to reproduce on a different machine for this example.

Attachments

Comment #1

Posted on Mar 26, 2010 by Grumpy Dog

Ok able to reproduce the but just via cat master.appendonly.aof | nc localhost 6379. Fixing it right new, news ASAP. The fix will be backported from Redis master to 1.2.6 today.

Comment #2

Posted on Mar 26, 2010 by Grumpy Dog

The issue is now fixed in Redis master. A backport of the bug for the 1.2.x release will be available later today.

Comment #3

Posted on Mar 26, 2010 by Helpful Lion

Confirmed using the piping in your comment above to redis-cli on all platforms, and confirmed in production with the binary data that sparked the report being replicated over the wire as expected.

Great work! Thanks!

Comment #4

Posted on Aug 27, 2010 by Grumpy Dog

(No comment was entered for this change.)

Status: Verified

Labels:
Type-Defect Priority-Medium