|
AppendOnlyFileHowto
We moved to Redis.io!Redis home moved to http://redis.io, please visit our new home. |
► Sign in to add a comment
|
Search
|
|
AppendOnlyFileHowto
We moved to Redis.io!Redis home moved to http://redis.io, please visit our new home. |
If you're used to MySQL, you can think of the append only file as a binlog for Redis. And the fsync() behvaior mimics what is available in InnoDB's transcation log configuration as well.
This is really cool :)
Since the AOF is to hold the delta datasets since last dump to dbfile, why is it that the file can grow larger with time? Can't it be trimmed appropriately following every dump or multiples thereof?
AOF with Fsync 1 time every second, how does this compare in performance with 'save 1 1'?
The append only operation can be used with the normal snapshot feature correct? When a save to the snapshot, the append log is cleared?
When the child is rewriting the append log the parent accumulates the new changes only in memory. If the server crashes in the mean time those changes are lost. If the child never sends the completion signal to the parent (perhaps because of a bug) that time could be longer than expected.
fmartini: actually we are totally safe, as the parent writes the new changes in the old file as well, not only in the in-memory buffer. So even if the rewrite fails, we didn't lose a single record.
The only way to lose data is that there is a bug in the rewrite code that makes the child writing the log in some wrong or incomplete way, and still claiming success exiting with a non-error exit code. To make sure this is not the case one could like to just make a backup of the old append only file before an BGREWRITEAOF command, or as an alternative, issue a SAVE from time to time (for instance one time a day) in order to have also an .rdb file with the dataset.
Basically the only way to protect against bugs that we still did't found is to make backups :) But this is a general concept, true with all the database systems, not just with Redis.
Thanks for making that clear antirez.
What are the API-level guarantees on data safety?
I'm going to attempt to answer my own question.
Looking through redis.c it would appear that the response is queued for asynchronous transfer to the client when the execution of the API command (e.g. "SET") is complete. Later in processCommand() the data is written to the append-only file, and fsync()ed. If I understand correctly this is a single-thread application so until an event loop is entered the transmission to the client cannot actually occur (it is buffered at an application level until that time).
An error write()ing to the file will cause the process to abort(), without a reply having been sent. Similarly a power failure interrupt the operation (which may or may not have completed) before control can return to the event loop (and thus not reply will have been sent).
So I believe that if the client receives an "+OK" result, it can be satisfied that the data has in fact been persisted to disk.
This leaves one question, which is the assumption that a single write() is atomic. This is not - in general - true, even on a journaling filesystem. A good paper on the topic is http://www.sqlite.org/atomiccommit.html . As the code stands a power failure could leave invalid data in the file, depending on the filesystem's handling of metadata updates.
Here's a tip that seemed to work for me. If you are bulk populating your database from another source, doing it with appendonly ON is SIGNIFICANTLY slower than with it OFF. So here's a faster way to populate a server you eventually want to run in appendonly mode:
You should now have redis running in appendonly mode with all of your freshly loaded data.
Redis could implement durability / fsync on every command with much better performance (throughput) at the expense of some latency for every individual client. Here is how would it work:
1) Get an additional max-delay server setting, and set it to something like 200ms
2) When Redis gets a SET it doesn't execute it immediately, instead puts it in a queue without responding +OK to the client.
3) After 200ms execute all modification commands in the queue at once, append them to the log and return +OK to the waiting clients.
This would avoid the disk seek at every command so it would improve throughput dramatically while still having perfect durability. For extra performance, at step 3 fork the process and have the child save the append log and respond to the waiting clients while the parent starts immediately accepting commands in a new queue.
Is there a way to split off logfiles into multiple files?
The idea is not to lose the entire logged data while data is stored on a non-robust storage like SD cards. So, by splitting logfile into multiple files we can mitigate the amount of data loss.
Hello all, is there a way to achieve a hybrid sync/append setup? E.g. if I sync every 5 minutes, is there a way to also atomically clear my AOF so that only the updates received after the last sync are logged? It seems that right now it is an either/or system. Either you sync and you risk losing the last x minutes of updates, or you have a humongous AOF file since the beginning of the application. Any help appreciated
Question - if I were to use the slowest option - fsync() after each dataset change - is that change synchronous the dataset change (so essentially, in the event of a catastrophic failure the potential for message loss was limited to transactions in progress but incomplete)?
"Warning: by default Redis will fsync() after every command! This is because the Redis authors want to ship a default configuration that is the safest pick. But the best compromise for most datasets is to fsync() one time every second."
Looking at the redis.conf included with 2.0.0 it has it defaulting to everysec. So either the documentation here is out of date, or the default config file is incorrect.
Yep, safe for default was a very bad idea... fixing the doc.
Hi, with large numbers of keys appendonly file (even after compaction) tends to stay large, much larger than a corresponding .rdb file. Would it be possible to make appendonly files 'incremental' on top of an .rdb file content?
This way 'rewrite' command would effectively 'save' into .rdb and reopen an appendonly file for new data.
Hello Andrey: the code to create the .rdb and the append only file is almost the same, it's the same process of forking the Redis process and writing, only the format is different. But to end with two different files may lead to corruption, something you really don't want in a database...
Not just this, but the two persistence modalities really serve different goals. Example: when you perform FLUSHALL the rdb file is reset to zero length, as the data is gone. With AOF you want to stay completely sure instead, and just FLUSHALL is appended: you'll lost things only if you'll perform a BGREWRITEAOF.
In the future what we may consider is changing the format of the AOF to a compact binary one.
Cheers, Salvatore
Thanks, Salvatore! Sure, one file is easier to deal with atomically and if it can be compressed - that'd be great.
Here is another thing you could consider:
When system restarts it can read from latest .rdb first (ignoring and killing unfinished temp files), follow that by reading all appendonly files in sequence and executing all commands past checkpoint from .rdb header. This should preserve your FLUSHALL behavior. There should be no data loss (even though saving is done in the background) and keeps appendonly files simple (just the text with the actual commands, as it is now)
I am using AOF, data is saved on the dump file ONLY when I restarted the server, so that means AOF is doing its job, but I have to restart the server everytime I need to test if data has been really saved on disk.
Thanks.
In http://code.google.com/p/redis/wiki/AppendOnlyFileHowto#What_should_I_do_if_my_Append_Only_File_gets_corrupted? I think:
"Fix the original file with: ./redis-check-dump --fix <filename>"
should be
"Fix the original file with: ./redis-check-aof --fix <filename>"
@saikat1: thanks fixed