What version of the product are you using? On what operating system?
All versions up to and including 2.5rc1
Tell me more...
If a large number of sets are committed to the cache (e.g 20000+) in a loop then one thread maxes out in the background putting data into the memcached client. The queue is emptied VERY slowly (a few 10s of items per second). A 100,000 item set loop will mean many minutes to empty the queue. I have profiled the code and the issue is in the class TCPMemcachedNodeImpl.
In the method fillWriteBuffer() the problem is that the readQ is checked to see if it contains the item about to be written -
if(!readQ.contains(o)) { readQ.add(o); }
The implementation of LinkedBlockingQueue uses a linear search with the expected consequences. There is a comment immediately above this which says -
// This isn't the most optimal way to do this, but it hints
// at a larger design problem that may need to be taken care
// if in the bowels of the client.
// In practice, readQ should be small, however.
In practice the readQ isn't small as every item that is set() is put into the queue. This is a major problem as it chews up processor time and means that the queue gets overflowed in situations when it should not. This is creating a major issue for the project I am working on.
I used this code to test this issue -
public class TestPerformance { private MemcachedClient memcache;
public static void main(String[] args) throws Exception {
new TestPerformance().run();
}
private void run() throws IOException {
memcache = new MemcachedClient(AddrUtil.getAddresses("localhost:11211"));
long time1 = System.nanoTime();
addLotsOfData();
long time2 = System.nanoTime();
System.out.println("Time to add data = " + ((time2 - time1) / 1000000.0) + "ms");
memcache.waitForQueues(86400, TimeUnit.SECONDS);
long time3 = System.nanoTime();
System.out.println("Time for queues to drain = " + ((time3 - time2) / 1000000.0) + "ms");
memcache.shutdown(2000, TimeUnit.MILLISECONDS);
}
private void addLotsOfData() {
for (int i = 0; i < 25000; i++) {
memcache.set(Integer.toString(i), 86400, "Hello this is a test " + i);
}
}
}
On my MacBook pro with 10000 items I see the following results -
Time to add data = 820.142ms Time for queues to drain = 2721.893ms
With 25,000 items - Time to add data = 1189.688ms Time for queues to drain = 24770.045ms
With 50,000 items -
Time to add data = 1473.822ms Time for queues to drain = 129161.422ms
I wrote a custom version of the LinkedBlockingQueue that uses a Set to track the contents of the queue and plugged it in via the ConnectionFactory (overriding createReadOperationQueue()) the problem was solved. However, I need to productionize this code as it is not strictly thread safe currently. This really needs to be fixed in the main line code.
Comment #1
Posted on Mar 10, 2010 by Massive Monkey(No comment was entered for this change.)
Comment #2
Posted on Mar 10, 2010 by Grumpy BearI looked at the threading access today around the offending queue and, if I have my analysis right, the readQ is only accessed from the spy thread. As such a simple temporary workaround of building a wrapper around LinkedBlockingQueue with an embedded Set seems to work for me at present - not ideal, but a massive improvement on the performance. I can now add 500,000 items to the set and see that it takes 41,300ms to add the data and 10,800ms to drain the queues. If you're interested in the code I can post (but it's pretty trivial). I also finally get to see memcached getting exercised (30-70% processor utilization rather than 0.1%).
Comment #3
Posted on Apr 9, 2010 by Happy HippoNick - can you post your bug fix or integrate it into github?
I believe i'm running into this issue in production.
Comment #4
Posted on Apr 9, 2010 by Grumpy BearHi spicysquid
The way I solved this was to create a wrapper around the LinkedBlockingQueue that is causing the problem. It contains a HashSet containing all the items in the queue - the contains() method can then use this efficiently. This works for two reasons -
There is very limited threading going on (only one spy thread + your application threads) - and the communications between the threads are quite controlled.
Not all methods are used in the LinkedBlockingQueue - and in particular I don't have to implement the semantics of drainTo() (thankfully).
It's a monumental bodge, but it works neatly because you can slot it in via the existing API by creating your own derivative of the ConnectionFactory and overriding createReadOperationQueue() to return the custom LinkedBlockingQueue.
Here's the wrapped Queue -
/** * This is a temporary solution to the performance issues of the contains method being * inefficient with long queues. */ public class CustomLinkedBlockingQueue extends LinkedBlockingQueue { private Set objectsInQueue = new HashSet();
public CustomLinkedBlockingQueue() {
super();
}
public CustomLinkedBlockingQueue(int capacity) {
super(capacity);
}
public CustomLinkedBlockingQueue(Collection<T> c) {
super(c);
}
@Override
public void put(T t) throws InterruptedException {
super.put(t);
objectsInQueue.add(t);
}
@Override
public boolean offer(T t, long timeout, TimeUnit unit) throws InterruptedException {
boolean b = super.offer(t, timeout, unit);
if (b) {
objectsInQueue.add(t);
}
return b;
}
@Override
public boolean offer(T t) {
boolean b = super.offer(t);
if (b) {
objectsInQueue.add(t);
}
return b;
}
@Override
public T take() throws InterruptedException {
final T o = super.take();
objectsInQueue.remove(o);
return o;
}
@Override
public T poll(long timeout, TimeUnit unit) throws InterruptedException {
final T o = super.poll(timeout, unit);
if (o != null) {
objectsInQueue.remove(o);
}
return o;
}
@Override
public T poll() {
final T o = super.poll();
if (o != null) {
objectsInQueue.remove(o);
}
return o;
}
@Override
public boolean remove(Object o) {
boolean b = super.remove(o);
if (b) {
objectsInQueue.remove(o);
}
return b;
}
@Override
public int drainTo(Collection<? super T> c) {
int count = super.drainTo(c);
if (count > 0) {
objectsInQueue.removeAll(c);
}
return count;
}
@Override
public int drainTo(Collection<? super T> c, int maxElements) {
if (c.size() > 0) {
throw new UnsupportedOperationException("doesn't support this at present");
}
int count = super.drainTo(c, maxElements);
if (count > 0) {
objectsInQueue.removeAll(c);
}
return count;
}
@Override
public T remove() {
final T o = super.remove();
objectsInQueue.remove(o);
return o;
}
@Override
public boolean contains(Object o) {
return objectsInQueue.contains(o);
}
@Override
public boolean containsAll(Collection<?> c) {
return objectsInQueue.containsAll(c);
}
}
then create a new ConnectionFactory by
public class MyConnectionFactory extends DefaultConnectionFactory {
// ... constructors omitted for clarity ...
@Override
public BlockingQueue<Operation> createReadOperationQueue() {
return new CustomLinkedBlockingQueue<Operation>();
}
}
I hope that helps.
- Nick
Comment #5
Posted on Oct 15, 2010 by Happy CatI am using this patch with one small change: private Set objectsInQueue = Collections.newSetFromMap(new ConcurrentHashMap());
not sure if its necessary but I wanted to make sure the HashSet was thread-safe.
-Jonathan
Comment #6
Posted on Aug 24, 2011 by Quick OxI have just begun working on a fix for this issue. Thanks to everyone for their feedback as well as the test to reproduce this issue. I want to provide some feedback on some of my findings from playing around with the code and then later I will post another update when I have completed my fix.
First off I want to note that while this portion of code can definitely be improved and should perform better, but anyone doing a sets at a very high rate should still handle timeout issues and check to make sure that their set operations are succeeding. I say this because even when I deleted the !readQ.contains(o) and just added the operation to the read queue I was still able to run into situations where I got large amounts of timeouts from my operations. This is because no matter how much better we make the code developers will still be io bound and if you write to the queues quicker than they can be sent out on the network then operation timeouts will be inevitable.
Comment #7
Posted on Aug 24, 2011 by Quick OxI just committed a fix for this. It's in review right now. I'll post a jar with patch on top of Spy 2.7.1 when it gets through review.
Comment #8
Posted on Sep 5, 2011 by Quick OxThe fix has made it through review and will be in the Spy 2.8 beta and 2.7.2 releases of Spymemcached.
Comment #9
Posted on Sep 6, 2011 by Quick OxHere's the 2.7.2 jar that contains a fix for this issue. It's the first commits on top of 2.7.1 so this is just 2.7.1 + Bug 125.
- spymemcached-2.7-56-g4e99186.jar 412.27KB
Comment #10
Posted on Sep 7, 2011 by Quick ElephantThanks
Comment #11
Posted on Jan 22, 2015 by Happy RabbitRelated to this: http://code.google.com/p/spymemcached/issues/detail?id=316
Status: Fixed
Labels:
Type-Defect
Priority-Medium
Milestone-Release2.7.2