My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
AppendOnlyFileHowto  
Updated Dec 21, 2010 by anti...@gmail.com

We moved to Redis.io!

Redis home moved to http://redis.io, please visit our new home.

Comment by jzaw...@gmail.com, Nov 27, 2009

If you're used to MySQL, you can think of the append only file as a binlog for Redis. And the fsync() behvaior mimics what is available in InnoDB's transcation log configuration as well.

Comment by dhruvb...@gmail.com, Nov 28, 2009

This is really cool :)

Comment by collins....@gmail.com, Dec 19, 2009

Since the AOF is to hold the delta datasets since last dump to dbfile, why is it that the file can grow larger with time? Can't it be trimmed appropriately following every dump or multiples thereof?

AOF with Fsync 1 time every second, how does this compare in performance with 'save 1 1'?

Comment by fastest...@gmail.com, Jan 4, 2010

The append only operation can be used with the normal snapshot feature correct? When a save to the snapshot, the append log is cleared?

Comment by fmart...@gmail.com, Jan 9, 2010

When the child is rewriting the append log the parent accumulates the new changes only in memory. If the server crashes in the mean time those changes are lost. If the child never sends the completion signal to the parent (perhaps because of a bug) that time could be longer than expected.

Comment by project member anti...@gmail.com, Jan 9, 2010

fmartini: actually we are totally safe, as the parent writes the new changes in the old file as well, not only in the in-memory buffer. So even if the rewrite fails, we didn't lose a single record.

The only way to lose data is that there is a bug in the rewrite code that makes the child writing the log in some wrong or incomplete way, and still claiming success exiting with a non-error exit code. To make sure this is not the case one could like to just make a backup of the old append only file before an BGREWRITEAOF command, or as an alternative, issue a SAVE from time to time (for instance one time a day) in order to have also an .rdb file with the dataset.

Basically the only way to protect against bugs that we still did't found is to make backups :) But this is a general concept, true with all the database systems, not just with Redis.

Comment by fmart...@gmail.com, Jan 9, 2010

Thanks for making that clear antirez.

Comment by twylite....@gmail.com, Feb 8, 2010

What are the API-level guarantees on data safety?

  • If I am using AOF with fsync() after every command and I issue a "SET" instruction, am I guaranteed that the fsync() has occurred before the "+OK" result is sent?
  • Is there any way (other than disk corruption) that a crash can occur such that I receive the "+OK" reply but the data is lost?

Comment by twylite....@gmail.com, Feb 8, 2010

I'm going to attempt to answer my own question.

Looking through redis.c it would appear that the response is queued for asynchronous transfer to the client when the execution of the API command (e.g. "SET") is complete. Later in processCommand() the data is written to the append-only file, and fsync()ed. If I understand correctly this is a single-thread application so until an event loop is entered the transmission to the client cannot actually occur (it is buffered at an application level until that time).

An error write()ing to the file will cause the process to abort(), without a reply having been sent. Similarly a power failure interrupt the operation (which may or may not have completed) before control can return to the event loop (and thus not reply will have been sent).

So I believe that if the client receives an "+OK" result, it can be satisfied that the data has in fact been persisted to disk.

This leaves one question, which is the assumption that a single write() is atomic. This is not - in general - true, even on a journaling filesystem. A good paper on the topic is http://www.sqlite.org/atomiccommit.html . As the code stands a power failure could leave invalid data in the file, depending on the filesystem's handling of metadata updates.

Comment by vatic...@gmail.com, Feb 9, 2010

Here's a tip that seemed to work for me. If you are bulk populating your database from another source, doing it with appendonly ON is SIGNIFICANTLY slower than with it OFF. So here's a faster way to populate a server you eventually want to run in appendonly mode:

  1. Disable all bgsave settings, and ensure appendonly is OFF in conf.
  2. Populate redis however you like (from localhost is best)
  3. issue the BGREWRITEAOF command
  4. once complete, you will now have an appendonly.aof file
  5. Set bgsave settings however you like and enable appendonly in conf
  6. restart redis

You should now have redis running in appendonly mode with all of your freshly loaded data.

Comment by ntos...@gmail.com, Mar 5, 2010

Redis could implement durability / fsync on every command with much better performance (throughput) at the expense of some latency for every individual client. Here is how would it work:

1) Get an additional max-delay server setting, and set it to something like 200ms

2) When Redis gets a SET it doesn't execute it immediately, instead puts it in a queue without responding +OK to the client.

3) After 200ms execute all modification commands in the queue at once, append them to the log and return +OK to the waiting clients.

This would avoid the disk seek at every command so it would improve throughput dramatically while still having perfect durability. For extra performance, at step 3 fork the process and have the child save the append log and respond to the waiting clients while the parent starts immediately accepting commands in a new queue.

Comment by pradeep....@gmail.com, Mar 13, 2010

Is there a way to split off logfiles into multiple files?

The idea is not to lose the entire logged data while data is stored on a non-robust storage like SD cards. So, by splitting logfile into multiple files we can mitigate the amount of data loss.

Comment by Sotiris....@gmail.com, May 10, 2010

Hello all, is there a way to achieve a hybrid sync/append setup? E.g. if I sync every 5 minutes, is there a way to also atomically clear my AOF so that only the updates received after the last sync are logged? It seems that right now it is an either/or system. Either you sync and you risk losing the last x minutes of updates, or you have a humongous AOF file since the beginning of the application. Any help appreciated

Comment by KKitteri...@markham.ca, Jun 2, 2010

Question - if I were to use the slowest option - fsync() after each dataset change - is that change synchronous the dataset change (so essentially, in the event of a catastrophic failure the potential for message loss was limited to transactions in progress but incomplete)?

Comment by jonkeat...@gmail.com, Sep 8, 2010

"Warning: by default Redis will fsync() after every command! This is because the Redis authors want to ship a default configuration that is the safest pick. But the best compromise for most datasets is to fsync() one time every second."

Looking at the redis.conf included with 2.0.0 it has it defaulting to everysec. So either the documentation here is out of date, or the default config file is incorrect.

Comment by project member anti...@gmail.com, Sep 9, 2010

Yep, safe for default was a very bad idea... fixing the doc.

Comment by andrey.k...@gmail.com, Sep 24, 2010

Hi, with large numbers of keys appendonly file (even after compaction) tends to stay large, much larger than a corresponding .rdb file. Would it be possible to make appendonly files 'incremental' on top of an .rdb file content?

This way 'rewrite' command would effectively 'save' into .rdb and reopen an appendonly file for new data.

Comment by project member anti...@gmail.com, Sep 24, 2010

Hello Andrey: the code to create the .rdb and the append only file is almost the same, it's the same process of forking the Redis process and writing, only the format is different. But to end with two different files may lead to corruption, something you really don't want in a database...

Not just this, but the two persistence modalities really serve different goals. Example: when you perform FLUSHALL the rdb file is reset to zero length, as the data is gone. With AOF you want to stay completely sure instead, and just FLUSHALL is appended: you'll lost things only if you'll perform a BGREWRITEAOF.

In the future what we may consider is changing the format of the AOF to a compact binary one.

Cheers, Salvatore

Comment by andrey.k...@gmail.com, Sep 24, 2010

Thanks, Salvatore! Sure, one file is easier to deal with atomically and if it can be compressed - that'd be great.

Comment by andrey.k...@gmail.com, Sep 24, 2010

Here is another thing you could consider:

  • create a 'sequence' of appendonly files e.g. in a subdirectory
  • close old/open new appendonly file after every N commands
  • periodically atomically (insert a 'checkpoint' into appendonly file / fork) and let the child produce e.g. compressed .rdb file with checkpoint written in the header
  • external cron job can now cleanup appendonly files older than the one containing latest checkpoint from a complete .rdb file

When system restarts it can read from latest .rdb first (ignoring and killing unfinished temp files), follow that by reading all appendonly files in sequence and executing all commands past checkpoint from .rdb header. This should preserve your FLUSHALL behavior. There should be no data loss (even though saving is done in the background) and keeps appendonly files simple (just the text with the actual commands, as it is now)

Comment by yuridraw...@gmail.com, Sep 28, 2010

I am using AOF, data is saved on the dump file ONLY when I restarted the server, so that means AOF is doing its job, but I have to restart the server everytime I need to test if data has been really saved on disk.

  • By just stopping the redis server should be enough for the AOF file to be edited?
  • I can't find the appendonly.aof file anywhere. Where should it be?
  • The docs talked about the log files, where are they?

Thanks.

Comment by saik...@gmail.com, Oct 1, 2010

In http://code.google.com/p/redis/wiki/AppendOnlyFileHowto#What_should_I_do_if_my_Append_Only_File_gets_corrupted? I think:

"Fix the original file with: ./redis-check-dump --fix <filename>"

should be

"Fix the original file with: ./redis-check-aof --fix <filename>"

Comment by project member anti...@gmail.com, Oct 1, 2010

@saikat1: thanks fixed


Sign in to add a comment
Powered by Google Project Hosting