My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Wiki pages
Links

Overview

BigCache is a very simple cache API providing users with an infinite shared cache backed by Amazon S3. No servers required.

Here's a quick code sample for how to use it.

// put an object into the cache
bigcache.put("somekey", myObject, 3600);

// Get an object from the cache
Object o = bigcache.get("somekey");

News

2008-11-14 Added asynchronous methods getAsync(key) and putAsync(key, object, expires)

Background

While testing out different caching scenarios, I got the idea of using S3 as a simple cache like memcached and decided to run some tests. This is what typical test times look like running on an EC2 server (the memcached was running on a separate server) for retrieving 100 simple strings:

  • Memcached took: 219
  • S3 took: 2679
  • SDB took: 2302
  • ehcache memory took: 1
  • ehcache 95% disk took: 11
  • Memcached mod took: 219
  • S3 mod took: 2203

UPDATE: New AsyncMethods need to be added to this benchmark, big performance gains when using them.

The last two mod tests retrieved 100 times, but only 10 different objects to see if S3 did any caching on recently retrieved items. Running this repeatedly revealed very similar results time after time. I realize this isn't exactly the greatest benchmarking test, so no need to let me know otherwise. ;)

So what can we gather from this:

  • In Memory is ultra fast (of course)
  • From local disk is also extremely fast (although much slower than in memory, 10x)
  • Memcached is amazingly fast too, especially considering it's over the network, apparently 2 ms per object (although much slower than in memory or local disk, 20x vs disk)
  • S3 and SimpleDB were similar when retreiving directly by key/id, but both were much slower than any other solution at about 25ms per object, 10x vs memcached)

So now you might say, "well obviously memcached is the way to go for large scale caching", and I would have to agree fully. BUT what about the extra work required to run a pool of extra servers just for memcached? When you've already got a ready to go infinite cache available in S3.

Double Layer Cache

Now you could just use S3 if you could live with the slowness, but the following scenario is an alternative that gives you the ultra fast memory/disk combo for your local cache and the slower S3 as the global shared cache. And usually your local disk is pretty much unused anyways so this makes good use of it. It is like a chained cache consisting of in memory and disk caching (ehcache) with S3 as a secondary cache. For example:

getObject(key){
   object = localCache.get(key);
   if(object != null) return object;
   object = s3cache.get(key);
   if(object != null) {
     localCache.put(key, object);
     return object;
   }
   object = getFromDatabase(key);
   if(object != null){
     localCache.put(key, object);
     s3Cache.put(key, object);
     return object;
   }
   return null;
}

This scenario would fully support stateless/sessionless load balancing although if you had a ton of app servers and your data was pretty user specific (private), then local/disk cache might become useless unless you have sticky sessions enabled on your load balancer. It would really depend on your app.

Other Performance Comparisons

Powered by Google Project Hosting