Google Code Archive - Long-term storage for Google Code Project Hosting.

Posted on Feb 13, 2011 by Quick Panda

Redis 2.1.12 - Ubuntu Server

Admittedly, the problem was entirely my fault. I was in a hurry setting up the server, I didn't check the "dir" config value, and it was set to "./". The config file being in /etc/, the server basically could not create the db file.

I have noticed after a week of running the box that I might as well been having using Memcached, because if anything bad happened, all my data would be obliterated. It didn't save anything. I tried ordering a BGSAVE, no luck, it started it but didn't write. I tried touch'ing the db file and setting proper rights on it, didn't help.

I then thought I would start a slave to sync the data, and then swap them, and this is all I got out of it:

12 Feb 16:53:22 * Connecting to MASTER... 12 Feb 16:53:22 * MASTER <-> SLAVE sync started: SYNC sent 12 Feb 16:53:22 # I/O error reading bulk count from MASTER: No such file or directory

So of course I wrote some code that sucked out all data and wrote it in the slave. That seemed to be my only option.

I was lucky enough that nothing was lost, but imo the server should not be able to start if it detects that it won't be able to persist data. This should print a big fat warning followed by a startup abort.

Hopefully you guys agree with me, otherwise I'm afraid one day someone is not going to be as lucky as I was. And if this is fixed in the master branch, apologies for the noise.

Comment #1

Posted on Feb 13, 2011 by Helpful Camel

I disagree. The server shouldn't die, because it can't save the DB. Consider a case with an active master and an active slave both running bgsave. If the master fails to save the data to disk, it's not the end of the world.

It should of course print a fat error message. The info command does tell you when you last successfully saved the db to disk. It's a reasonable thing to panic if the time grows beyond the planned interval (say, 30 minutes). I don't know of a monitoring tool that does this, but of course 10 lines of ruby would do.

Comment #2

Posted on Feb 13, 2011 by Quick Panda

Don't get me wrong, of course it shouldn't die under any circumstances, and especially not in the event of a save failure.

What I meant is that it shouldn't be allowed to start if it can't get open a file handle to its designated db file, that way you clearly notice the issue when it's still time to fix it.

Comment #3

Posted on Feb 14, 2011 by Grumpy Dog

@boggiano: what about instead the ability to change the current dir via CONFIG SET?

Btw all you needed to do in your problem was using a debugger to change the current working directory at runtime. This can be a feature of Redis without problems: CONFIG SET dir /path/to/newdir

Cheers, Salvatore

Comment #4

Posted on Feb 14, 2011 by Grumpy Dog

Just added CONFIG SET dir and CONFIG GET dir into both the 2.2 and unstable branch.

Cheers, Salvatore

Comment #5

Posted on Feb 14, 2011 by Quick Panda

Salvatore, thanks for the prompt action. This sure would have helped me to fix it, I actually tried it but of course it didn't work.

That being said, I still believe that you should not allow the server to start in such a broken state, because honestly if I hadn't been curious and looked at the INFO report, I would never have noticed it wasn't saving, and the server could have crashed, or I would have restarted for some reason, and I would have lost everything.

If the server is configured to never save anything (e.g. because there is a slave doing the saving), sure let it run like that, but if you have a dir/dbfile configured it means you want it to save, and if it's unable to do that it shouldn't happily run until a disaster happens. We're talking of people losing their data here, it can be pretty serious issues. Please consider it.

Jordi

Comment #6

Posted on Feb 14, 2011 by Swift Hippo

We did lose data because of this behaviour - in our case Redis didn't have write permissions on the rdb file. It never occurred to us to check whether Redis was saving, because I'm poetry sure most databases freak out noisily if they can't write to their datafile.

If Redis had simply refused to start up, we'd have found the problem and fixed it within minutes. As it was, we lost a week's worth of data when we restarted Redis for a minor config change.

Comment #7

Posted on Feb 14, 2011 by Swift Hippo

Um, "poetry sure" should have read "pretty sure"... autocorrect error!

Comment #8

Posted on Feb 15, 2011 by Helpful Camel

Many databases do and it's a considerable pain. For example, mongodb won't start automatically if the machine crashed. So if you have a healthy replica set, you still need to manually cleanup and start the failed server. If the server wasn't that picky and just joined the replica set, it would automatically sync up.

As a side note, there's a lot of reasons why bgsave could be failing on a live and well-performing instance of redis (e.g. you're approaching memory limit, somebody broke the permissions on the dump folder, you ran out of disk space, etc). If you want reliable infrastructure, you should be able to detect these cases and recover from them.

That said, the server could offer a verbose mode that would let it to come online only if everything is squeaky-clean (say, it could save the db, it could talk to master/replicas, etc) and refuse to start if any of the problems are observed.

Comment #9

Posted on Feb 15, 2011 by Quick Panda

In no case was I saying that it shouldn't boot after a crash or anything, just to make it clear.

I'm really just talking of the initial startup, it should check for misconfiguration and warn users. But after that, if you decide to screw up the permissions or run out of disk space I guess you can't really be helped.

redis - issue #453