|
FAQ
Frequently Asked Questions
General QuestionsWhat is memcached?memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Danga Interactive developed memcached to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. memcached dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss. Where can I get memcached?Follow the pointers from the download page. How can I install memcached?For a tutorial, go to: http://blog.ajohnstone.com/archives/installing-memcached Also consider checking your distrobution's package management system (apt, yum, etc). Generic install instructions are standard. memcached requires libevent to be installed first. This is most likely available via your distrobution's package manager. If your distrobution doesn't have memcached or an updated enough version, installing from source is simple. Fetch the tarball from our download page. $ tar -zxvf memcached-1.x.x.tar.gz $ ./configure $ make $ make test $ sudo make install use './configure --help' to see all of the options. Where can I run memcached?Anywhere you have spare ram! Memcached runs on linux, BSD, windows. It will usually use very little CPU, so fire it up wherever there's free ram. Why might I want to run memcached?If you have a high-traffic site that is dynamically generated with a high database load that contains mostly read threads then memcached can help lighten the load on your database. It can be useful in many other situations as well. Read through the whole FAQ and related tutorials for ideas. If your DB load is low but CPU usage very high, you could cache computed objects and renderred templates. You can reduce writes related to session handling, temporarily stash data, cache small but frequently accessed files, cache results from web "services" or RSS feeds... Even if you're not totally out of resources, but want your pages to render with less latency, it can be helpful. Why should I not use memcached?Answered on this page. How do I access memcached?Typically, you use a client library from your application to access one or more memcached servers. The Clients page has list of available API libraries, which are available for Perl, C, C#, PHP, Python, Java, Ruby, and Postgresql Stored Procedures and Triggers. You can also write your own client library using the memcached protocol documentation found here: http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt How can I use memcached as a database?If you want to use memcached as a data store instead of a cache, you should use a database instead. MySQL Cluster has some of the same properties as memcached (not the ease of install though!) and can be setup as a reliable HA datastore. Can I iterate the items of the memcached server?No. memcached doesn't support that and it's not a planned feature. It would be a relatively slow and blocking operation (compared to everything else memcached is doing). See above, it's a cache, not a database. Tugela and memcachedb are memcached derived systems that are slower but slightly more like a database. Of course it's all software, so ultimately in a way the answer is "yes", but for anything but a development or test server it will be slow and block the server while processing, so for 99.9% of real deployments the answer is no. What exactly do we mean by "block the server" ? All of memcached's non-debug commands... add, set, get, flush, all take near constant time to execute, no matter how much data is in the cache. Any command which needs to iterate cache items will get progressively slower as there is more data in the cache. The blocking issue comes in when other commands are unable to execute because they are waiting on the slow iterative command to finish. Say your "delete all my keys by (blah blah)" command takes half a second on average. You have plenty of CPU free on your memcached server, and you only need to run this command once every few seconds. So what's the problem? For that half second, most of your other requests will be delayed by at least half a second. It'll take as long as it takes the hardware to process through that queue in order to catch up. So all of your other requests end up taking too long. So we try real hard not to do that. If you really need to iterate over your items, consider that using a MySQL store where primary accesses are all by primary key, and you only have a single secondary index for doing searches, is actually relatively fast. Cluster Architecture QuestionsHow does memcached work?Memcached's magic lies in its two-stage hash approach. It behaves as though it were a giant hash table, looking up key = value pairs. Give it a key, and set or get some arbitrary data. When doing a memcached lookup, first the client hashes the key against the whole list of servers. Once it has chosen a server, the client then sends its request, and the server does an internal hash key lookup for the actual item data. For example, if we have clients 1, 2, 3, and servers A, B, C: Client 1 wants to set key "foo" with value "barbaz". Client 1 takes the full list of servers (A, B, C), hashes the key against them, then lets say ends up picking server B. Client 1 then directly connects to server B, and sets key "foo" with value "barbaz". Next, client 2 wants to get key "foo". Client 2 runs the same client library as client 1, and has the same server list (A, B, C). It is able to use the same hashing process to figure out key "foo" is on server B. It then directly requests key "foo" and gets back "barbaz". Different client implementations store data into memcached differently (perl Storable, php serialize, java hibernate, JSON, etc). Some clients also implement the hashing algorithm differently. The server is always the same however. Finally, memcached itself is implemented as a non-blocking event-based server. This is an architecture used to solve the C10K problem and scale like crazy. For an even simpler, drawn out explanation of how memcached client/server iteraction works, see A Story of Caching What's the big benefit for all this?Carefully read the above entry (How does memcached work?). The big benefit, when dealing with giant systems, is memcached's ability to massively scale out. Since the client does one layer of hashing, it becomes entirely trivial to add dozens of nodes to the cluster. There's no interconnect to overload, or multicast protocol to implode. It Just Works. Run out of memory? Add a few more nodes. Run out of CPU? Add a few more nodes. Have some spare RAM here and there? Add nodes! It's incredibly easy to build on memcached's basic principles to implement many different kinds of caching architectures. Hopefully detailed elsewhere in the FAQ. See the following few FAQ entries for how this compares to a server's local cache or MySQL's query cache, which should help you grok the big picture! How does it compare to MySQL's query cache?Adding memcached support to your application can be a lot of work. MySQL has a handy query cache feature that will automatically cache the results of your SQL queries, making them way faster on repeat runs. How does memcached compare to this? MySQL's query cache is centralized, so its benefits are seen by all servers connecting to it.
How does it compare to a server local cache? (PHP's APC, mmap memory, etc)Note a lot of the similar issues above. Memory usage is limited to what spare memory you have on your single box. A local cache benefits over both MySQL query cache and memcached because you can store arbitrary data objects into it, and it does not have latency associated with a fetch over a network.
What is memcached's cache?The cache structure is an LRU (Least Recently Used), plus expiration timeouts. When you store items into memcached, you may state how long it should be valid in the cache. Which is forever, or some time in the future. If the server is out of memory, expired slabs are replaced first, then the oldest unused slabs go next. How is memcached redundant?It's not! Surprise! Memcached is a caching layer for your application. It is not designed to have any data redundancy. If a node loses all of its data, you should still be able to fetch it all again from the source. Especially be careful that your application can survive losing memcached instances. Don't write awful queries and expect memcached to be a fix-all! If you're worried about having too much of a spike in database usage during failure, you have some options. You can add more nodes (lessen impact of losing one), hotspares (take over IP address when down), etc. How does memcached handle failover?It doesn't! :) There is no central authority to do anything at all in the case of a memcached node failure. The behavior is entirely up to the user. There are many options on what you might want to do in the case of a node failure. You can:
How can you dump data from or load data into memcached?You don't! Memcached is what we call a non blocking server. Anything that could cause the server to pause and not respond to requests momentarily must be thought through very carefully. Loading your cache from a dump is often really not what you want anyway! Consider if any of your data changes between you dumping and then loading the cache. You now have out of date data to deal with. How do you also manage items that were due to expire from cache before the dump was loaded? It's not as useful as you might think. There is a case where this can be useful. If you have huge amounts of data that never changes and you like your caches toasty warm, loading your cache from dump could help. While this is not the typical case at all, it happens often enough that such a feature might appear in the future. Steven Grimm, as always, gives another good example on the mailing list here: http://lists.danga.com/pipermail/memcached/2007-July/004802.html But I really need to dump and load data from memcached!Okay okay. The common issue for dump/loading is often that you have cache items which take a very long time to regenerate, or you get significant database pain from losing a server. If you get too much pain from having a single memcached disappear, you're in trouble for a lot of reasons. Your system is very fragile. Consider looking for some things to tune. Work on stampeding herd issues (if your DB gets clogged up with repeat queries after a cache disappears... noted elsewhere in the FAQ), or tuning bad queries. Memcached is not an excuse for not tuning your queries. If you just take a very long time (15 seconds to 5+ minutes) to generate your cache entries, you might consider using a database again. Note some quick options:
Again, the above option is cacheable in memcached, and should offer good performance in the case of needing to restart memcached. It should improve overall performance, as you don't have to worry about memcached's LRU accidentally evicting a hot item, or users always needing to wait several minutes to regenerate a cache item if it suddenly disappears out of RAM. Note this is the same approach detailed in this blog post about sessions: http://dormando.livejournal.com/495593.html How does memcached's authentication mechanisms work?It doesn't! Memcached is the soft, doughy underbelly of your application. Part of what makes the clients and server lightweight is the complete lack of authentication. New connections are fast, and server configuration is nonexistent. If you wish to restrict access, you may use a firewall, or have memcached listen via unix domain sockets. What are memcached threads?Threads rule! Thanks to Steven Grimm and Facebook, memcached 1.2 and higher has a threaded operation mode. The threaded system allows memcached to utilize more than a single CPU and share the cache between all of them. It does this by having a very simple locking mechanism when certain values, items, etc need to be updated. This helps make multi gets more efficient, versus running multiple nodes on the same physical server to achieve performance. If you don't have a heavily loaded setup, you probably don't need to configure threads. If you're running a gigantic website with gigantic hardware, you might see benefit here. More info: http://code.sixapart.com/svn/memcached/trunk/server/doc/threads.txt In short summary: command parsing (where memcached spends most of its time) is ran under multiple threads. Operating on the cache internals are under global locks (and thus that time is not threaded). Future improvements in the threading system should remove more global locks and further improve performance under extreme loads. What is the maxiumum key length? (250 bytes)The maximum size of a key is 250 characters. Note this value will be less if you are using client "prefixes" or similar features, since the prefix is tacked onto the front of the original key. Shorter keys are generally better since they save memory and use less bandwidth. What are the limits on setting expire time? (why is there a 30 day limit?)You can set expire times up to 30 days in the future. After that memcached interprets it as a date, and will expire the item after said date. This is a simple (but obscure) mechanic. What is the maximum data size you can store? (1 megabyte)The maximum size of a value you can store in memcached is 1 megabyte. If your data is larger, consider clientside compression or splitting the value up into multiple keys. Why are items limited to 1 megabyte in size?Ahh, this is a popular question! Short answer: Because of how the memory allocator's algorithm works. Long answer: Memcached's memory storage engine (which will be pluggable/adjusted in the future...), uses a slabs approach to memory management. Memory is broken up into slabs chunks of varying sizes, starting at a minimum number and ascending by a factorial up to the largest possible value. Say the minimum value is 400 bytes, and the maximum value is 1 megabyte, and the factorial is 1.20: slab 1 - 400 bytes slab 2 - 480 bytes slab 3 - 576 bytes ... etc. The larger the slab, the more of a gap there is between it and the previous slab. So the larger the maximum value the less efficient the memory storage is. Memcached also has to pre-allocate some memory for every slab that exists, so setting a smaller factorial with a larger max value will require even more overhead. There're other reason why you wouldn't want to do that... If we're talking about a web page and you're attempting to store/load values that large, you're probably doing something wrong. At that size it'll take a noticeable amount of time to load and unpack the data structure into memory, and your site will likely not perform very well. If you really do want to store items larger than 1MB, you can recompile memcached with an edited slabs.c:POWER_BLOCK value, or use the inefficient malloc/free backend. Other suggestions include a database, MogileFS, etc. Can I use different size caches across servers and will memcached use the servers with more memory efficiently?Memcache's hashing algorithm that determines which server a key is cached on does not take into account memory sizes across servers. But a workaround may be to run multiple memcached instances on your server with more memory with each instance using the same size cache as all your other servers. What is the binary protocol? Should I care?The best information is in the binary protocol spec. The binary protocol an attempt to make a more efficient, reliable protocol to help speed up CPU time used for the client/server protocol. According to Facebook's tests, parsing the ASCII protocol is one of the largest consumers of CPU time in memcached. So why not improve on it? :) Older information in this thread on the mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004636.html How does memcached's memory allocation work? Why not use malloc/free!? Why the hell does it use slabs!?Actually, it's a compile time option. The default is to use the internal slab allocator. You really really want to use the built-in slab allocator. At first memcached did just use malloc/free for everything. However this does not play very well with OS memory managers. You get fragmentation, and your OS ends up spending more time trying to find contiguous blocks of memory to feed malloc() than it does running the memcached process. If you disagree, of course you're free to try malloc! just don't complain on the lists ;) The slab allocator was built to work around this. Memory is allocated in chunks internally and constantly reused. Since memory is broken into different size slabs, you do waste memory if your items do not fit perfectly into the slab the server chooses to put it in. This has enjoyed considerable efficiency improvements by Steven Grimm. Some older posts about the slab changes (power of n vs power of 2), and some tradeoffs are on the mailing list: http://lists.danga.com/pipermail/memcached/2006-May/002163.html http://lists.danga.com/pipermail/memcached/2007-March/003753.html And if you'd like to attempt to use malloc/free and see how it works, you may define 'USE_SYSTEM_MALLOC' in build process. It might not be tested very well, so getting developer support for it is unlikely. More info: http://code.sixapart.com/svn/memcached/trunk/server/doc/memory_management.txt Is memcached atomic?Of course! Well, lets be specific:
In 1.2.5 and higher there are the "gets" and "cas" commands, which are used to deal with these situations. If you issue a 'gets' command to fetch a key, you get a unique identifier back with that value. If you later wish to overwrite the original key, you send that identifier back with the "cas" command. If the identifier stored in memcached is identical to the one you supplied, you win and your write succeeds. If another process has modified that same key in the meantime, the identifier will have changed and your write will fail. In general updating memcached based on data in memcached is tricky, you should only do it if you're confident in what you're doing. Performance QuestionsMemcached is not faster than my database. Why?In a one to one comparison, memcached may not be faster than your SQL queries. However, this is not its goal. Memached's goal is scalability. As connections and requests increase, memcached will perform better than most database only solutions. Please test your code under high load with simultaneous connections and requests before deciding memcached is not right for you. Client LibrariesWhat client libraries are available for memcached?See How do I access memcached above. Can I access the same data in memcached with different client libraries?Technically, yes, but the two issues you may run into are as follows:
What is a "consistent hashing" client?Consistent hashing algorithms are a new approach to managing the first-layer hashing system for memcached clients. A good post (and library) explaining its usage has been "posted by http://www.last.fm/user/RJ/journal/2007/04/10/392555 Client FAQA few notes, for now: Clients may implement a "prefix" in order to set the domain of a key. For instance you can take the customer name and use it to create a specific domain for their keys in a shared hosting environment. The "prefix" should be applied to the key when storing the value, but should not be used when calculating hash. While Memcached itself implements not method for serializing structures, JSON is the most widely deployed object neutral serialization type. Memcached OptionsIf you want to learn about memcached's options, just run memcached -h. It will give you a brief output of options. You can fiddle with the options to easily learn how they work. There is also a memcached(1) manpage which (should) come with the memcached distrobution. Item ExpirationWhen do expired cached items get deleted from the cache?memcached uses a lazy expiration, which means it uses no extra cpu expiring items. When an item is requested (a get request) it checks the expiration time to see if the item is still valid before returning it to the client. Similarly when adding a new item to the cache, if the cache is full, it will look at for expired items to replace before replacing the least used items in the cache. Namespacesmemcached does not support namespaces. However, there are some options to simulate them. Simulating Namespaces with key prefixesIf you simply want to avoid key colision between different types of data, simply prefix your key with a useful string. For example: "user_12345", "article_76890". Deleting by NamespaceWhile memcached does not support any type of wildcard deleting or deletion by namespace (since there are not namespaces), there are some tricks that can be used to simulate this. They do require extra trips to the memcached servers however. Example, in PHP, for using a namespace called foo: $ns_key = $memcache->get("foo_namespace_key");
// if not set, initialize it
if($ns_key===false) $memcache->set("foo_namespace_key", rand(1, 10000));
// cleverly use the ns_key
$my_key = "foo_".$ns_key."_12345";
$my_val = $memcache->get($my_key);
//To clear the namespace do:
$memcache->increment("foo_namespace_key");Application DesignWhat are some things I should consider with regard to caching when I design my application(s)?Generic Design Approaches(namespaces/session/etc should move or be linked under here instead) Simple query result cachingQuery caching is the storage of an entire result set from a given query. It is best used for queries that are called often but the SQL does not change, such as loading content by a specific set of filters ( e.g., get topics for a specific forum, get products for a category) $key = md5('SELECT * FROM rest_of_sql_statement_goes_here');
if ($memcache->get($key)) {
return $memcache->get($key);
}
else {
// Run the query and transform the result data into your final dataset form
$result = $query_results_mangled_into_most_likely_an_array
$memcache->set($key, $result, TRUE, 86400); // Store the result of the query for a day
return $result;
}Remember, if the result of this query changes, the results will not show up for a day. This approach isn't always useful, but gets the job done good n' quick. Simple row-based query result cachingRow-based caching is checking a list of known data identifiers for cached data. Those rows that have data already stored are retrieved. Rows that are not cached are pulled from the database and stored, each with their own key, in memcache and then added to the final dataset, which is returned. Over time, most data points will be cached so more and more queries will pull all their rows from memcache instead of from the database. If the data is relatively static, a longer caching time can be used. This pattern is extremely useful in searches where datasets will vary based on input parameters but will overlap from query to query and the datasets are large or are pulled from multiple tables. For example, if you have a dataset of users A, B, C, D, E You view a page with information on users A, B, E. First, you do a memcached get with three independent keys, one for each user. Say they all come up miss. Then you would do an SQL query to fetch row info for all three users, then store into memcached. Now, you view another page with users C, D, E on it. When you do that memcached get again, you miss on C, D, and hit on E. Select rows for C, D, set into memcached. At this point, for the next few minutes maybe, any page referring to A, B, C, D, or E, in any mix or order, will be completely cached. Action flood controlFlood control is the process of throttling user activity, usually for load management. We first try to add a memcache key that uniquely identifies a user and times out after a given interval. If that succeeds, there is no identical key, and thus the user should be allowed to do the action. If the add fails, the user is still in the flood control interval, so shouldn't be allowed to continue their action. If all else fails and the key cannot be added or retrieved, something's wonky with memcache and it's up to you to decide whether to allow action or not (suggested yes to prevent long term memcache issues from stopping all actions). So, if user A makes a comment in thread 7, and you don't want them to be able to comment again for another 60 seconds: 'add' a key (eg) 'noflood:A:7' into memcached. If you get a SUCCESS, the user may post. If you get a NOT_STORED (but not an error!), the key still exists and the user should be warned. Note you may also try fetching a key and doing incr/decr on it if a user should only be allowed to perform an action a certain number of times before being throttled. Cache things other than SQL data!When first plugging memcached into everything you can get your hands on, it may not be obvious that you can or should cache anything other than SQL resultsets. You can, and you should! If you were building a profile page for display. You might fetch a user's bio section (name, birthdate, hometown, blurb). Then you might format the blurb to replace custom XML tags with HTML, or do some nasty regexes. Instead of caching 'name, birthdate, hometown, blurb' independently, or as one item, cache the renderred output chunk! Then you may simply fetch the pre-procsesed HTML chunk ready for inclusion in the rest of the page, saving precious CPU cycles. Use a cache hierarchyIn most cases you have the ability to use a localized cache or memcached. We know to use memcached so we may enjoy a massive volume of cached data in a high speed farm, but sometimes it makes sense to go back to your roots a little and maintain multiple levels of cache. Peter Zaitsev has written about the speed comparisons of PHP's APC over localhost, vs memcached over localhost, and the benefits of using both:
Often you'll have a very small amount of data (product categories, connection information, server status variables, application config variables), which are accessed on nearly every page load. It makes a lot of sense to cache these as close to the process as possible (or even inside the process, if you can). It can help lower page render time, and increase reliability in case of memcached node failures. Update memcache as your data updatesOne of the most important improvements you can make for ensuring your cache is a seamless integration with your application, is to actually update the cache at the same time as updating the database. So, user A edits his profile. While saving the profile to the database, you may either set the new profile data into memcached (preferred), or simply send a delete to remove old profile data. If you update the data immediate, you may prevent the database from ever having to do a read on that data. When the user habitually reloads their profile to see the latest changes, it will be pulled directly from cache, and they will have the latest information available. This is fantastic, since no user wants to see outdated data, do they? Race conditions and stale dataOne thing to keep in mind as you design your application to cache data, is how to deal with race conditions and occasional stale data. Say you cache the latest five comments for display on a sidebar in your application. You decide that the data only needs to be refreshed once per minute. However, you neglect to remember that this sidebar display is renderred 50 times per second! Thus, once 60 seconds rolls around and the cache expires, suddenly 10+ processes are running the same SQL query to repopulate that cache. Every time the cache expires, a sudden burst of SQL traffic will result. Worse yet, you have multiple processes updating the same data, and the wrong one ends up dating the cache. Then you have stale, outdated data floating about. One should be mindful about possible issues in populating or repopulating our cache. Remember that the process of checking memcached, fetching SQL, and storing into memcached, is not atomic at all! How to prevent clobbering updates, stampeding requestsSo how does one prevent clobbering your own updates or stampeding during a cache miss? Aka, "cache stampedes" or "database stampedes". The easiest answer is to avoid the problem. Don't set caches to expire, and update them via cron, or as data is updated. This does not eliminate the possibility of a stampede, but removes it from becoming the norm. Some great ideas from the mailing list also underline another approach: If you want to avoid a stampede if key A expires for its common case (a timeout, for example). Since this is caused by a race condition between the cache miss, and the amount of time it takes to re-fetch and update the cache, you can try shortening the window. First, set the cache item expire time way out in the future. Then, you embed the "real" timeout serialized with the value. For example you would set the item to timeout in 24 hours, but the embedded timeout might be five minutes in the future. Then, when you get from the cache and examine the timeout and find it expired, immediately edit the embedded timeout to a time in the future and re-store the data as is. Finally, fetch from the DB and update the cache with the latest value. This does not eliminate, but drastically reduces the amount of time where a stampede can occur. A decent python example can be found here: http://www.djangosnippets.org/snippets/155/ If you have a lot of data excelling at causing this problem, you might also consider using MySQL Cluster for it, or a tiered caching approach Another (pretty cool!) idea is to use Gearman, as noted on the mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004858.html Other threads from the mailing list:
Emulating locking with the add commandIf you really need a lock around a key, you can emulate it via the 'add' command. This is not so useful on cache misses, but more useful if you are using memcached as the canonical store for some piece of data (for example, some metadata about the app server pool, perhaps). Say you want to update key "A".
This is analogous to using MySQL's GET_LOCK with a timeout value set to 0. There's no way to emulate GET_LOCK()'s timeout operations via a mutex within memcached. As of writing no one's gotten annoyed enough to try adding such a feature. Here's an attempt to build semaphore locks in memcached. Also see the mailing list discussion Pre warm your cacheIf you have a very highly used site, and you're bringing a feature back from the dead, or launching a brand new feature, you might end up having issues with an empty cache. Cache comes up empty, herd of humans click, and your database gets overwhelmed while trying to fill the cache. In order to get around this you may try "warming" your cache with any method available. You could write a script to walk the website and cache common pages. You could write a commandline tool which runs through your list of users online at that moment, filling caches appropriately. Either way it could potentially help. You may also try to ensure you don't have empty caches during peak hours :) Storing lists of dataStoring lists of data into memcached can mean either storing a single item with a serialized array, or trying to manipulate a huge "collection" of data by adding, removing items without operating on the whole set. Both should be possible. One thing to keep in mind is memcached's 1 megabyte limit on item size, so storing the whole collection (ids, data) into memcached might not be the best idea. Steven Grimm explains a better approach on the mailing list: http://lists.danga.com/pipermail/memcached/2007-July/004578.html Chris Hondl and Paul Stacey detail alternative approaches to the same ideal: http://lists.danga.com/pipermail/memcached/2007-July/004581.html A combination of both would make for very scalable lists. IDs between a range are stored in separate keys, and data is strewn about using individual keys. Batch your requests with get_multiIf you just get started with memcached, you might end up with code which looks similar to this: greet = get("Foo")
person = get("Bar")
place = get("Baz")As you scale, you might notice this might come back to this while trying to reduce render time. You'll notice that each of these get() calls will do a full round-trip to memcached: get("Foo") - client - server - client
get("Bar") - client - server - client
etc.Most clients support the ability to do multi-key gets, pipelining requests into single memcached instances. Others yet allow for parallel fetches. So if your 3 keys would resolve to 3 different memcached's, the requests all happen in parallel and you end up waiting for the slowest one to return instead of all three independently. If you have many keys to fetch, this can mean a huge difference in speed! More good techniques on how to coalesce and parallelize requests are on the mailinglist, with a recent one from Brad here: http://lists.danga.com/pipermail/memcached/2007-July/004528.html Is there a guaranteed order to the results returned by get_multi?No, you'll have to do your own processing (such as sorting) in your client/application if you depend on some ordering. source Creating good keysIt's a good idea to use sprintf (), or a similar function, when creating keys. Otherwise, it's easy for null values and boolean values to slip into your keys and these may not work as you expect. e.g. memKey = sprintf ( 'cat:%u', categoryId ); WIP: Mulled this over, need someone with better examples to fill this in. Short keys tend to be good, using prefixes along with an MD5 or short SHA1 can be good, namespace prep is good. What else? Using Memcached as a simple message queuePerhaps you want to use memcached as a cheap queue or write-back cache. One technique is using incr/decr to generate unique keys for queue item management, as described here: http://broddlit.wordpress.com/2008/04/09/memcached-as-simple-message-queue/ Related, here is another writeup on using memcached as a queue. The client code is ruby but should be applicable to any client language.
Be aware of hitting memory limits and cache expiry. And, these solutions fundamentally do a lot of inefficient polling of the memcached server, continually asking the memcached server "are there any items in the queue for me?" Improving multi_get efficiency for related data (fetch by master key)Some clients support a special feature which allows you to cluster "related" data on the same memcached instance. Normally a multi_get command will issue one request for each memcached instance your keys map to. In using this feature, you're almost guaranteed to fetch all of your keys from a single instance in a single roundtrip, which is more efficient. This is called fetch/store "by master key" usually. libmemcached has by_master_key() and related commands. When using this feature, you submit two keys with your data. The first key is used by the client to determine which server to access data from, the second key is what's actually sent to memcached for storage. For example, if you want to load a user's profile page, they might have a lot of little settings/twiddles/values to load at the same time:
... then fetch away from a single instance. Whoo fun! If you use the namespaces trick, you're often able to use this trick as well, since that data is often related by whatever specified the namespace. Referencing multiple keys to one valueIn short, you can't do this. One key goes to one value. What you can do is have multiple items point to a central item. id_main005: "data goes here" id_by_foo: "id_main005" id_by_bar: "id_main005" ... then you do multiple fetches to find the indirect item. Weird, but not too bad. Would be simplified if memcached ever gets proper tag support. Troubleshooting common problemsHelp! My keys are all wanged up / disappear / etcDo you have a cluster of memcached servers and webservers? Are you using localhost ('127.0.0.1') as server IP in any of your server lists? If so, don't do that. All of your webservers need to have the same list of memcached IP's in the same order, even if the instance is local to that server. This is so the server hashing algorithm works correctly. Also see below. My cache items always expire early! My cache never fills all the way!Help help! My cache is buggy! I set an expire time of one day, and five minutes later the cache item is gone! Memcached is a buggy pile of crap! I've tracked this down a few times ... you've likely written something which is executing "flush_all" against memcached often. The 1.2.7 and higher stable release should have a counter, so you can run the 'stats' command and see if flush_all is getting ran often. You can also confirm this by running memcached in -vv mode, logging to file, then grepping for "flush_all". It should pop up a few times. If you don't see this, it's also possible you've set your memory limit very low. It is not advisable to start memcached with a memory limit less than 64 megabytes. It will not be able to pre-allocate enough slabs for your data, and some slabs will only be able to store a handful of items before being forced to evict them. Use as much memory as you can with the -m option. Sporadic disconnectionsIf you are seeing occasional dropped or refused connections to your memcached servers, there are a lot of potential explanations. Often it'll be one of:
Help! Sometimes clients hang connecting to memcached!See the below too sections on connections - the most likely culprit is that you are running into the maximum connection limit. When that happens new connections are queued into the TCP listen backlog, until old connections die off. The issue with client hangs being due to the TCP listen backlog was first noted by Chris Goffinet. Dormando later figured out the rest and added tools to track down this condition. Other possibilities include buggy OS's, dropped packets, and stateful firewalls falling over. How do I tune the max connections setting (-c)?If you are running memcached 1.2.7 or newer, there is a stat called "listen_disabled_num". The short answer here is if this number is higher than 0, you need to increase your max connections limit. The longer answer, is to calculate how many clients you expect to be connecting at once. It's also okay to set this number higher than you absolutely need it to be. If you have 3 webservers running apache, and each apache process can potentially create one connection to memcached, you calculate this by your MaxClients setting. IE; the maximum potential apache processes that can be running at a time. If that's set to 30, then you have 30 3 == 90 possible connections. Might as well set -c to 4096 to be safe ;) As noted in the sections below, be very careful of bugs in your application which can cause extra connections to memcached. Improper usage of client objects can spawn many connections from a single page request, which will break your site. Too many connections to memcachedConstantly maxing out connections to a particular (or all) memcached server(s)? First, keep in mind that it's okay to have as many connections as you want connected to memcached. Increase the max size to be what you need. On the other hand, if you have 4,000 connections and three servers, something's out of whack. The number of ESTABLISHED connections to your memcached instances can be at least one per webserver (apache) slot you have. If you have an application running under apache2, and your set MaxClients to 10, you can have up to 10 memcached connections under normal circumstances. On the other hand, if you choose to run apache in threaded mode and set MaxClients to 1024, obviously your count will be much higher. The world will also question just how many CPU's you have in there. The connection rate also depends on your preferred client a little. Some perform connection pooling, which will alter the amount of connections per server. Remember nothing is stopping you from accidentally connecting many times. If you instantiate a memcached client object as part of the object you're trying to store, don't be surprised when 1,000 objects in one request create 1,000 parallel connections. Look carefully for bugs like this before hopping on the list. Ideas for FAQ entries
|
Sign in to add a comment
how to terminate an instance of memcached?????????????????
we are using memcached for caching php objects for the news release site www.pressreleasepoint.com we want to increase the memory limit without shutting down the memcached daemon. how to increase the memory (RAM) allocation limit of memcached dynamically? do you have any tools or api to change the memory limit of a live daemon?
You cannot alter the memory limit of a running instance; set it to as much free ram as you have once, then start it up. If you need more help, please ask questions on the mailing list.
Re: creating keys for caching queries -- I work with code that does that, albeit not with memcached. Since similar SQL queries usually differ near the end of the string (select foo, bar, baz from sometable where condition OR othercondition) and Java's String.hashCode() function starts at the beginning of the string, we wound up using MD5 to generate the keys for queries. We also generate a key based off an in-code constant and the bind variables for a shorter and more readable key.
What will happen is the cache memory is full? Will it overflow, i.e. the application using it get stuck? Or will it wipe the eldest objects even if they are not expired yet?
What is memcached's cache?
The cache structure is an LRU (Least Recently Used), plus expiration timeouts. When you store items into memcached, you may state how long it should be valid in the cache. Which is forever, or some time in the future. If the server is out of memory, expired slabs are replaced first, then the oldest unused slabs go next.
On initial compile on solaris 10 / Sparc v9 - I was getting the result that an item would report as "STORED" on set, yet would be not found on any get.. after debugging this, I found differences in the hash of the key, which in turn were caused because the configure script had detected the host to be little endian.. editing config.h (#define ENDIAN_BIG 1 and / #undef ENDIAN_LITTLE / ) to force to big endian.. apears to have resolved the issue.
I have ramdom and buggy behaviours on a drupal large site with distributed memcached array. It's difficult to trace the problem, but I suspect the problem is in MEMCACHE. On preproduction environment, with non distributed memcache and just one front server, the performance it's ok and not buggy.
Ideas?
Please report issues to the mailing list
What's the deal with the Zombie Bunnies Logo on the main site?
Rabbits are (fast, scalable, stupid).