Export to GitHub

spymemcached - issue #10

CancelledKeyException while handling IO with a downed server


Posted on Mar 13, 2008 by Happy Hippo

I've configured two memcached and let one of them shutdown during a loop of get keys. I can see some INFO logs telling it's attempting to reconnect. But when I put the failed memcached back, the app hangs and stops to print any logs. This can be happened on both sync and async get.

environment: windows xp/java1.5/memcached-2.0.jar

Comment #1

Posted on Mar 14, 2008 by Helpful Elephant

I've reproduced this. Thanks.

Comment #2

Posted on Mar 14, 2008 by Helpful Elephant

Actually, scratch that. I got a cancellation exception on the client, but once I caught that, the client continued to do the right thing.

Can you provide a small test case? This is what I did:

http://github.com/dustin/java-memcached-client/tree/master/src/test/manual/net/spy/memcached/test/MultiNodeFailureTest.java

Comment #3

Posted on Mar 14, 2008 by Happy Hippo

I've found the problem is exception related. The following is what my test case, but if I put the get() into try/catch, it's working fine.

=============================================== String m_sMemcachedHosts = "localhost:11211 localhost:11212"; MemcachedClient mc3 = new MemcachedClient(AddrUtil.getAddresses (m_sMemcachedHosts));

    for (int i = 0; i < 50; i++)
    {
        long t1 = System.currentTimeMillis();
        String val = (String) mc3.get("test");
        long t2 = System.currentTimeMillis();

        System.out.println("[" + i + "] mc3 test=" + val + " (" + (t2 - t1) + ")");

        try
        {
            Thread.sleep(1000);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }

Comment #4

Posted on Mar 14, 2008 by Helpful Elephant

OK, that's similar enough to what I've done.

I'm going to change this to a documentation bug and attempt to make it clearer what this behavior is.

Comment #5

Posted on Mar 14, 2008 by Helpful Elephant

Sorry for thrashing this bug so much, but I just read the detailed report on the list, and there's definitely something going wrong in my client.

Exception in thread "Memcached IO over {MemcachedConnection to /127.0.0.1:11211}" java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69) at java.nio.channels.SelectionKey.isReadable(SelectionKey.java:271) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:262) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:180) at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:730)

I'm not certain I can make this happen for me, but I should be able to expect it and make it deal with it a bit better.

Comment #6

Posted on Mar 16, 2008 by Helpful Elephant

Status update:

I was looking into this tonight, and I don't see how that exception should be able to escape. The code in question looks like this:

try { [...] if(sk.isReadable()) { // CancelledKeyException thrown here handleReads(sk, qa); }

[...]

} catch(Exception e) { getLogger().info("Reconnecting due to exception on %s", qa, e); queueReconnect(qa); }

For my reference, the full report is here:

http://www.nabble.com/Re%3A-MemcachedClient-and-timeout-p16046771.html

Comment #7

Posted on Apr 28, 2008 by Massive Camel

I am seeing this, but I'm still using 1.4.

Exception in thread "Memcached IO over {MemcachedConnection to xxx/xx.xx.xx.xx:11211 xxx/xx.xx.xx.xx:11211}" java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69) at java.nio.channels.SelectionKey.isReadable(SelectionKey.java:271) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:263) at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:181) at net.spy.memcached.MemcachedClient.run(MemcachedClient.java:715)

Comment #8

Posted on Apr 28, 2008 by Helpful Elephant

1.4 is quite old. I haven't done any work on that branch since July 2007. I've fixed numerous bugs since then (which I believe includes this one).

Comment #9

Posted on Oct 2, 2008 by Helpful Elephant

I can't reproduce this and the code path I'm aware of suggests it's relatively impossible, so I'm going to close this as invalid unless someone can get me a test.

Status: Invalid

Labels:
Type-Defect Priority-High Component-Logic