My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Optimizations  
Description of how the client itself is optimized
Featured
Updated Jul 17, 2011 by dsalli...@gmail.com

Optimization Overview

There are several elements of the design that each allow high throughput. Each is discussed below along with an example case showing many of them working together.

Single-threaded IO

The IO thread mirrors the server-side design of memcached by multiplexing asynchronous IO across multiple connections to multiple servers.

Each MemcachedClient instance establishes and maintains a single connection to each server in your cluster. Data are sent as soon as they become available and are able to be received by the remote sides. Responses are similarly collected as soon as they arrive.

Very Low Contention

There are two points where client threads (i.e. your code) and the IO thread meet. Whenever a caller needs to issue a request against memcached, it does so by building an object that represents the request which it queues into a java.util.concurrent.BlockingQueue instance (with a non-blocking insertion). A java.util.concurrent.Future is returned to the caller.

When a request is complete and the response is available, the IO thread feeds it back into this future.

The contention on the enqueuing is as small as java's concurrency utilities allow (in practice, this is quite low). As for the result, the typical scenario involves a single thread waiting on a latch. It's hard to imagine a case where either of these would contribute a detectable amount of latency.

Asynchronous Interface

With an asynchronous interface, it's possible to “fire and forget” requests such as sets and deletes. You can optionally wait for the results, but if it's not necessary for your application, I won't make you do it.

For every enqueued request, there's a java.lang.concurrent.Future returned that is used to track the progress of the request. All of the communication from the IO thread to the callers is done by way of futures. This also allows you to do things like processing between a get request and the response from it.

Multi-get Escalation

In the process of finding data to write over the wire, the IO thread will notice when there are multiple get requests in a row and collapse them into a single multi-get request with deduplicated keys.

For example, several outstanding requests in a queue for a single server that look like this: [a], [b], [a, b, c], [a], [d], will be collapsed into a single request for [a, b, c, d] and then the results will be delivered to the respective callers (five in this case).

Protocol Pipelining

Another effect of having a single connection to each server is that the process of converting a request to the wire format doesn't actually care about the requests themselves, so it can effectively ignore natural boundaries.

For example, if the queue contains a few gets, sets and deletes, it's possible to send all of those in a single packet.

Altogether Now

Consider an example where a memcached instance has two values, x and y, both set to 1. No other values exist within this instance. Six threads issue requests as shown in the diagram below at approximately the same time, and get queued top to bottom.

  1. The first request will immediately hit the wire since enqueuing a request causes the loop to look for IO. This returns the value 1 to t1. t1 is happy and goes home.
  2. The next time through the loop, more items have made it into the queue, so they're processed together (at least, until the transmit buffer is full).
    1. The next time we're ready to transmit, the IO thread will notice that there are three gets in a row (the optimizer does not look past anything that will mutate data), and combine them into a single get. The gets have overlapping keys, so the keys will be deduplicated so that the write buffer receives a multi-get for the keys y and z.
    2. The write buffer is not full, so we look to the input queue again and see there's a set ready, so we add that to the buffer.
    3. The write buffer is still not full, so we look at the input queue yet again and see another get. There are no more gets, so we enqueue this one.
    4. The write buffer is not full, but the input queue is empty, so we transmit this data.
  3. The multi-get for y and z returns, so we send t2 and t4 the value 1 for y and t3 a null for z (since z was not in our cache). Threads t2, t3, and t4 now have all of the necessary results.
  4. The set returns and the results (we'll call it a success) are sent to the future given to t5. If t5 cares about the result, it now has it. If it doesn't, it's off doing something else by now.
  5. Finally, the second get request for y returns, this time with the value 2, which is provided to the t6.

See Also:

Blog post on write optimizations

Comment by hank.h...@gmail.com, Oct 27, 2009

Question about connection pool. Currently each MemCached? clinet creates one connection to one memcached server. To create multiple connections to one memcached server, multiple memcached Clients have to be created. So a object pool has to be maintained to simulate teh connnection pool. Is it right? do you think it is better to use only one connection?

Comment by zhuozhe, Nov 11, 2009

The 1 is enougy by using Non-Blocking

Comment by ldc.dr...@gmail.com, Dec 30, 2009

Does spymemcached have a consistent hashing algorithm? I plan on using it anyway, though id like what i see more if it did (its probably mentioned somewhere)

Comment by project member dsalli...@gmail.com, Dec 30, 2009

@ldc.drake

There's a KetamaNodeLocator? which has pluggable hashes (the compatible ketama hash uses md5 which is terribly slow. If you're all java, you can use the java native hash with the KetamaNodeLocator? and get pretty good performance, for example.

Comment by mohanrao...@gmail.com, Jan 18, 2010

If we are using a memcached server that doesn't support multi-get. Is there a way to disable the optimizations or does the client automatically handle this ?

Comment by project member dsalli...@gmail.com, Jan 18, 2010

If the server does not support multiget, it doesn't implement the protocols correctly. No compensation is made for this.

Comment by eric%fen...@gtempaccount.com, Jan 27, 2010

For maximum throughput, what's the best HashAlgoritm?? FNV1(a)64 or NATIVE?

Comment by project member dsalli...@gmail.com, Jan 27, 2010

I'd suggest testing, but native hashes could be the fastest as they're memoized on strings. I doubt it really matters (as long as you're not using md5).

Comment by eric%fen...@gtempaccount.com, Jan 28, 2010

I'll probably need to test anyway to justify upgrading from 1.2.2 to 1.4.x and from danga client to spymemcached.

Comment by loniszc...@gmail.com, Apr 12, 2010

hi, i couldn't find how spymemcached splits elements into differents servers. do you implement the libketama's consistent hashing?

Comment by sureshbh...@gmail.com, Jun 25, 2010

Hi Dustin: I would appreciate if you could answer this query. I am using the sample code for Spymem client from the google site (http://code.google.com/p/spymemcached/wiki/Examples) I am using something like mccClient = new MemcachedClient?(new BinaryConnectionFactory?(), AddrUtil?.getAddresses(sb.toString())); Now, is the default constructor good for production environment which takes read buffer size and queue length both to be the default values, which is 16384. Any help would be appreciated.

Comment by sureshbh...@gmail.com, Jun 25, 2010

Well, the real questions were what exactly these values mean? Read Buffer size, and QueueLength?? Could any one answer by giving an example? The docs are pretty much hopeless on this.

Comment by sureshbh...@gmail.com, Jun 25, 2010

Sorry, I was referring to the BinaryConnectionFactory?.

/
  • Create a DefaultConnectionFactory? with the default parameters.
public BinaryConnectionFactory?() {
super();
}
/
  • Create a BinaryConnectionFactory? with the given maximum operation
  • queue length, and the given read buffer size.
public BinaryConnectionFactory?(int len, int bufSize) {
super(len, bufSize);
}
Comment by ken....@gmail.com, Sep 18, 2010

Is SpyMemcached? written base on Java NIO? I am comparing Spymemcached with other memcached client library such as Xmemcached which claimed that it is the fastest for high concurrency application.

Comment by ken....@gmail.com, Sep 18, 2010

Hi all, I am new to Spymemcached and I have a few questions about MemcachedClient?: 1. Is MemcachedClient? thread safe? 2. Can I just create and maintain one MemcachedClient? instance to be used throughout an application that serves thousand of concurrent requests? 3. Does Spymemcached provide connection pooling feature for MemcachedClient?(s) that serves multiple requests?

Comment by project member dsalli...@gmail.com, Sep 18, 2010

Ken:

Yes it's NIO. Yes it's threadsafe. Your questions seem to imply you have an implementation for such a client in mind and think it might perform badly. Try this one and let me know how you would improve it. Contributions are welcome. :) For many people, it ends up being fast enough.

Comment by Sylvain....@gmail.com, Oct 13, 2010

Hi All. I feel a bit strange with this single-io-thread architecture. Let me explain. We use a farm of memcached servers (let's say 10) and our java process is doing high concurrent processing involving requests to an API on which we use memcached to make it faster. What I can see in analyzing the performances of the application is all the cache requests (gets and sets) are processed by only one thread, which doesn't look optimum because I think gets can be done concurrently without any problem, so that it can benefit from having multiple servers, and gets should not wait for sets to happen. Because at the moment the gets and the sets are in the same queue. Am I missing anything ? I would do such IO thread on a one-per-server basis, so that we take advantages of having multiple memcached servers. Any comment would be appreciated.

Comment by mukesh.a...@gmail.com, Oct 19, 2010

When I give multiple servers list to Memcached client using binaryconnection factory, does it do consistent hashing ? if not, then why does it take a 'list' of servers and not a single server. If yes, then what is the purpose of Ketamaconnection factory.

The reason is I did a test with 2 memcached servers, each with 512 mb ram. Then I stored 5000 small sequential keys (0-string, 1-string ...etc) using binary connection factory. I passed the list of servers to addutil. Then from another test program I feteched the list. For some reason when I stored and retrieved the values sequentially using binnaryconnection factory, i would get cache misses after ~3500. However when I used ketamaconnection factory the lookup succeeded everytime. Does mean that hashing of binary connection factory is broken ?

Comment by zaphrox...@gmail.com, Oct 25, 2010
Comment by zhuozhe, Nov 11, 2009

The 1 is enougy by using Non-Blocking

Is this a configuration option? Or do you mean using the async Get and Puts?

Comment by federico...@gmail.com, Jan 26, 2011

asyncGetBulk is one call or as many calls to memcached as keys

does the API sends like: GET this and this and this or or GET this, GET this, GET this

Any tip of how to optimize for multiple gets in one call besides using asyncGetBulk

Thanks,

Comment by phaidin...@gmail.com, Nov 1, 2011

I'm doing some benchmarks between different Java Memcache Client and notice that when I do a 'Set' on 500,000 String (String.valueOf(0-499999) it takes less than two seconds but when I do a 'Get' on the same 500,000 String values it takes around 79 seconds. Do you know what might account for this. BTW, I wait until all the 'Set's are finished.

Thanks

-Pete

Comment by brian.w...@inphosoft.com, Feb 7, 2012

Did the same testing as Pete and the result is similar.. With spymemcached, "gets" are much slower than "sets". Quite confused.

- Brian

Comment by ewjor...@gmail.com, Mar 17, 2012

Pete and Brian, are you using async gets, and firing them off/waiting for them to finish using the same logic as the sets?


Sign in to add a comment
Powered by Google Project Hosting