My favorites | Sign in
Project Logo
Project hosting will be READ-ONLY Wednesday at 8am PST due to brief network maintenance.
                
Code license: Apache License 2.0
Labels: backup, s3, java, sdb, aws
Show all Featured downloads:
quillen-0.4.zip
Feeds:
People details
Project owners:
  grourk
Project committers:
martin.davidsson, blarko

QUILLEN By Greg Nelson

WARNING Until Quillen reaches v1.0, new versions may not be compatible with snapshots made with previous versions. When upgrading, please first run the uninstall command and then run the install command (to delete and create Quillen's S3 Buckets and SDB domains). This will delete previous snapshots!


Quillen backs up your important documents or other data to your Amazon S3 and SimpleDB account. Information about Amazon S3 and SimpleDB can be found at http://www.amazon.com/s3 and http://aws.amazon.com/simpledb/. You must obtain an AWS account from Amazon and enable S3 and SimpleDB prior to using Quillen.

Quillen is designed to be a simple command line tool that can be crontabbed and forgotten about.

Quillen talks directly to Amazon S3 and SimpleDB without any intermediate servers. You pay for your usage directly to Amazon.

When you backup a file or directory, Quillen places those files into a conceptual "snapshot". It is a full snapshot of those files at that point in time. A snapshot can later be restored to your local disk.

Quillen splits files into chunks of data of variable size. The chunks are determined by their contents, rather than fixed offsets into a file. This enables Quillen to keep data transfer and storage to a minimum since only chunks that have not been seen before will be uploaded. Chunks have an average size of 128K, although they can range from 2K - 5MB. This method of chunking also means that a full backup of a set of files that have already been placed in a snapshot does not result in re-uploading those files. And if the files have changed since they were last backed up, only those chunks that have changed need to be uploaded. The result is a new full snapshot. The full set of files can be restored from that snapshot. This is de-duplication between snapshots.

This method of chunking also enables de-duplication within snapshots. If two files are both backed up at the same time, and the two files share chunks of data, each of those chunks is only uploaded and stored once. Quillen keeps track of the fact that a chunk is referenced by multiple files.

Another advantage of this method is that Quillen can process and transfer multiple chunks in parallel, taking full advantage of the network connection. This results in faster backups than if the chunks were uploaded serially.

Finally, because Quillen processes files in chunks, an error during transfer (process is killed, network connetion goes away, power outage, etc.) doesn't mean that Quillen has to start all over again from the beginning. It will pick up where it left off and avoid re-uploading chunks it has already uploaded.

EXAMPLE

A directory of three files are backed up to snapshot1: foo, bar, and baz. Each is 1GB in size. What's more, bar and baz share 500MB of content. Quillen splits each file into chunks and the result is 1GB + 1GB + 500MB data transferred and stored.

Now say file foo gets edited and a single byte is appended to the beginning of the file. With an offset chunking scheme, this would result in every chunk in the file getting changed! But in Quillen, only the first chunk has been changed. If that chunk was originally 128,000 bytes, it is now 128,001 bytes. The directory is backed up to snapshot2, and only that first chunk is transferred. The result is 1GB + 1GB + 500MB + 128K data stored. At this point, you can restore snapshot1 and get the original set of files. You can restore snapshot2 and get the whole set of files with foo's edit. You can delete snapshot1 because it is out of date, and snapshot2 will still represent the whole set of files.

It should also be noted that the amount of data transferred and stored would actually be less if this data was text, since Quillen uses gzip compression.

INSTALLATION

  1. Make sure you have Java 1.6 or higher installed on your machine.
  2. Expand quillen-0.4.zip, which includes quillen-0.4.jar and all its dependencies.
  3. Edit the file called quillen.properties and fill it in with your AWS credentials.
  4. Tell Quillen to create the necessary S3 buckets and SimpleDB domains (it will create 2 buckets and 3 domains):
  5. java -jar quillen-0.4.jar -command install 2> quillen.log

BASIC USAGE

ADVANCED USAGE

UNINSTALLATION

Simply tell Quillen to delete the S3 buckets and SimpleDB domains it created. This may take some time especially if you have a lot of data backed up. Please note that this will delete all snapshots!

java -jar quillen-0.4.jar -command uninstall 2> quillen.log








Hosted by Google Code