|
Optimizations
Description of how the client itself is optimized
Featured Optimization OverviewThere are several elements of the design that each allow high throughput. Each is discussed below along with an example case showing many of them working together. Single-threaded IOThe IO thread mirrors the server-side design of memcached by multiplexing asynchronous IO across multiple connections to multiple servers. Each MemcachedClient instance establishes and maintains a single connection to each server in your cluster. Data are sent as soon as they become available and are able to be received by the remote sides. Responses are similarly collected as soon as they arrive. Very Low ContentionThere are two points where client threads (i.e. your code) and the IO thread meet. Whenever a caller needs to issue a request against memcached, it does so by building an object that represents the request which it queues into a java.util.concurrent.BlockingQueue instance (with a non-blocking insertion). A java.util.concurrent.Future is returned to the caller. When a request is complete and the response is available, the IO thread feeds it back into this future. The contention on the enqueuing is as small as java's concurrency utilities allow (in practice, this is quite low). As for the result, the typical scenario involves a single thread waiting on a latch. It's hard to imagine a case where either of these would contribute a detectable amount of latency. Asynchronous InterfaceWith an asynchronous interface, it's possible to “fire and forget” requests such as sets and deletes. You can optionally wait for the results, but if it's not necessary for your application, I won't make you do it. For every enqueued request, there's a java.lang.concurrent.Future returned that is used to track the progress of the request. All of the communication from the IO thread to the callers is done by way of futures. This also allows you to do things like processing between a get request and the response from it. Multi-get EscalationIn the process of finding data to write over the wire, the IO thread will notice when there are multiple get requests in a row and collapse them into a single multi-get request with deduplicated keys. For example, several outstanding requests in a queue for a single server that look like this: [a], [b], [a, b, c], [a], [d], will be collapsed into a single request for [a, b, c, d] and then the results will be delivered to the respective callers (five in this case). Protocol PipeliningAnother effect of having a single connection to each server is that the process of converting a request to the wire format doesn't actually care about the requests themselves, so it can effectively ignore natural boundaries. For example, if the queue contains a few gets, sets and deletes, it's possible to send all of those in a single packet. Altogether NowConsider an example where a memcached instance has two values, x and y, both set to 1. No other values exist within this instance. Six threads issue requests as shown in the diagram below at approximately the same time, and get queued top to bottom.
See Also: |
Question about connection pool. Currently each MemCached? clinet creates one connection to one memcached server. To create multiple connections to one memcached server, multiple memcached Clients have to be created. So a object pool has to be maintained to simulate teh connnection pool. Is it right? do you think it is better to use only one connection?
The 1 is enougy by using Non-Blocking
Does spymemcached have a consistent hashing algorithm? I plan on using it anyway, though id like what i see more if it did (its probably mentioned somewhere)
@ldc.drake
There's a KetamaNodeLocator? which has pluggable hashes (the compatible ketama hash uses md5 which is terribly slow. If you're all java, you can use the java native hash with the KetamaNodeLocator? and get pretty good performance, for example.
If we are using a memcached server that doesn't support multi-get. Is there a way to disable the optimizations or does the client automatically handle this ?
If the server does not support multiget, it doesn't implement the protocols correctly. No compensation is made for this.
For maximum throughput, what's the best HashAlgoritm?? FNV1(a)64 or NATIVE?
I'd suggest testing, but native hashes could be the fastest as they're memoized on strings. I doubt it really matters (as long as you're not using md5).
I'll probably need to test anyway to justify upgrading from 1.2.2 to 1.4.x and from danga client to spymemcached.
hi, i couldn't find how spymemcached splits elements into differents servers. do you implement the libketama's consistent hashing?
Hi Dustin: I would appreciate if you could answer this query. I am using the sample code for Spymem client from the google site (http://code.google.com/p/spymemcached/wiki/Examples) I am using something like mccClient = new MemcachedClient?(new BinaryConnectionFactory?(), AddrUtil?.getAddresses(sb.toString())); Now, is the default constructor good for production environment which takes read buffer size and queue length both to be the default values, which is 16384. Any help would be appreciated.
Well, the real questions were what exactly these values mean? Read Buffer size, and QueueLength?? Could any one answer by giving an example? The docs are pretty much hopeless on this.
Sorry, I was referring to the BinaryConnectionFactory?.
Is SpyMemcached? written base on Java NIO? I am comparing Spymemcached with other memcached client library such as Xmemcached which claimed that it is the fastest for high concurrency application.
Hi all, I am new to Spymemcached and I have a few questions about MemcachedClient?: 1. Is MemcachedClient? thread safe? 2. Can I just create and maintain one MemcachedClient? instance to be used throughout an application that serves thousand of concurrent requests? 3. Does Spymemcached provide connection pooling feature for MemcachedClient?(s) that serves multiple requests?
Ken:
Yes it's NIO. Yes it's threadsafe. Your questions seem to imply you have an implementation for such a client in mind and think it might perform badly. Try this one and let me know how you would improve it. Contributions are welcome. :) For many people, it ends up being fast enough.
Hi All. I feel a bit strange with this single-io-thread architecture. Let me explain. We use a farm of memcached servers (let's say 10) and our java process is doing high concurrent processing involving requests to an API on which we use memcached to make it faster. What I can see in analyzing the performances of the application is all the cache requests (gets and sets) are processed by only one thread, which doesn't look optimum because I think gets can be done concurrently without any problem, so that it can benefit from having multiple servers, and gets should not wait for sets to happen. Because at the moment the gets and the sets are in the same queue. Am I missing anything ? I would do such IO thread on a one-per-server basis, so that we take advantages of having multiple memcached servers. Any comment would be appreciated.
When I give multiple servers list to Memcached client using binaryconnection factory, does it do consistent hashing ? if not, then why does it take a 'list' of servers and not a single server. If yes, then what is the purpose of Ketamaconnection factory.
The reason is I did a test with 2 memcached servers, each with 512 mb ram. Then I stored 5000 small sequential keys (0-string, 1-string ...etc) using binary connection factory. I passed the list of servers to addutil. Then from another test program I feteched the list. For some reason when I stored and retrieved the values sequentially using binnaryconnection factory, i would get cache misses after ~3500. However when I used ketamaconnection factory the lookup succeeded everytime. Does mean that hashing of binary connection factory is broken ?
Is this a configuration option? Or do you mean using the async Get and Puts?
asyncGetBulk is one call or as many calls to memcached as keys
does the API sends like: GET this and this and this or or GET this, GET this, GET this
Any tip of how to optimize for multiple gets in one call besides using asyncGetBulk
Thanks,
I'm doing some benchmarks between different Java Memcache Client and notice that when I do a 'Set' on 500,000 String (String.valueOf(0-499999) it takes less than two seconds but when I do a 'Get' on the same 500,000 String values it takes around 79 seconds. Do you know what might account for this. BTW, I wait until all the 'Set's are finished.
Thanks
-Pete
Did the same testing as Pete and the result is similar.. With spymemcached, "gets" are much slower than "sets". Quite confused.
- Brian
Pete and Brian, are you using async gets, and firing them off/waiting for them to finish using the same logic as the sets?