Export to GitHub

memcached-session-manager - issue #228

Memcached failover in non-sticky mode


Posted on Jun 24, 2015 by Happy Horse

A bit about the setup I'm using: 1 haproxy as a load balancer' 2 tomcat6 nodes; 2 memcached nodes; Non-sticky mode. Kryo serialization strategy; Operation and sessionBackup timeouts are default; Locking startegy: auto.

What steps will reproduce the problem? 1. Start all the nodes 2. Login into application 3. Check that session backup is saved in the secondary memcached nodes 4. Shutdown the primary memcached node 5. Navigate to some other page in application (sometimes I get dropped to log in screen with new session identifier not sure why this is happening but it's possibly caused by timeouts) 6. Restore the memcached node (it takes a while for tomcat to detect that node is back to up state and store the backup of session into it. I'm looking for the options to change this timeout) 7. As the session backup process is triggered by user requests, in this step I'm making some interactions with the application until the session is stored as backup. 8. Kill other node (which is now primary) 9. Next interation with application will get me into the login screen (session information lost), but if I'll change the session id to the session that had to be restored then I will be able to use application with that session identifier).

Basically it's quite interesting situation and currently I'm not sure what causing this behaviour as I can't stabily reproduce this issue. Any suggestions will be appreciated.

Comment #1

Posted on Jul 1, 2015 by Happy Horse

I've investigated this issue a bit more. So the session is lost when there are concurrent requests and one of them is matched by requestIgnorePattern. As far as I understand there's a racing condition which request will get served first. In case it will be the ignored request the session will be lost as backup retrival will not be triggered. When this parameter is omitted in context.xml failover is working as expected but in my case we have a lot of heavy js pages and each request to such page will be generating 30-50 requests to each memcached nodes to update the metadata of the session stored there. So disabling it is not an option.

Comment #2

Posted on Jul 2, 2015 by Grumpy Bear

Great that you investigated this more!

So the session is lost when there are concurrent requests and one of them is matched by requestIgnorePattern.

Are you referring to a request that should not match the requestIgnorePattern? Is the pattern incorrect / too broad then?

As far as I understand there's a racing condition which request will get served first

If the browser sends parallel requests (e.g. via ajax), then there's indeed no guarantee which one hits the server first. This would have to be handled on the client side, the server can nothing do about this.

In case it will be the ignored request the session will be lost as backup retrival will not be triggered.

But a request after the ignored one then should trigger backup retrieval, doesn't this happen then?

Are the "heavy js pages" somehow related to the session, or are this just "stateless" resources?

Comment #3

Posted on Jul 2, 2015 by Happy Horse

About the requestIgnorePattern: pattern matches the png file in my case. Basically I'm trying the following scenario: 1. Login, both memcached nodes up and session is backed up correctly 2. Kill primary node 3. When I'm selecting the menus - png request is sent to backend (css background). Right after I'm clicking the link and calling the controller.

In case if the png request is served first request tracking host valve is not performing the check of the primary node status, session is not recovered from the backup. After it I'm getting new session id which is not contained in any of memcached nodes and following request (controller) is served with this new session id so application is redirecting to log in screen. Currently I'm not sure how this is happening but disabling requestIgnorePattern fixes this issue. This possibly can have something with the spring security session fixation protection or other similiar stuff.

In case controller gets served in first place then failover is working as expected.

Under the heavy js pages I mean that they are requesting a lot of js files while they are loading. These requests don't change session information in any way.

Comment #4

Posted on Jul 3, 2015 by Happy Horse

Comment deleted

Comment #5

Posted on Jul 3, 2015 by Happy Horse

I've tried to reproduce this issue on the msm sample app that is hosted on github. The fail-over is working as expected there with the same configuration and same tomcat instance. As there were no resources like png, ico etc. I've added one but it was still working as expected.

Also I've tried to make a fix for this behavior by adding the primary memcached node availability check in RequestTrackingHostValve where ignorePattern is evaluated. As far as I can tell this fix works and failover is working as expected in my application.

Comment #6

Posted on Jul 3, 2015 by Grumpy Bear

Can you submit a pull request with your change?

Comment #7

Posted on Jul 6, 2015 by Happy Horse

Submitted the pull request with possible fix: https://github.com/magro/memcached-session-manager/pull/44

Comment #8

Posted on Jul 14, 2015 by Happy Horse

Did you have time to look into it by chance?

Comment #9

Posted on Jul 17, 2015 by Grumpy Bear

Sorry for the late response, business work took all the time... I had a look at your PR - AFAICS in the case of primary node unavailability requests that otherwise should be ignored then a NOT ignored but go through standard session handling.

I tend to think that while this may solve the specific issue, it's still just a workaround and there is a different root cause.

I'd say that requests that should be ignored should completely bypass session handling, so they should not depend on memcached availability at all. If such requests cause issues this is probably not the case. I'd prefer to find and fix this issue.

What do you think?

Comment #10

Posted on Jul 19, 2015 by Massive Camel

This "fix" was made just to show what I mean and more like a treatment of the symptom then the cause. It's definitely not a solution for the problem. Also I was not able to reproduce this issue with the test app (wicket). So I guess I'll invest a bit more time into investigation of this issue until it will be clear what causing it. Just had a little hope that you'll "magically" find the problem =).

Comment #11

Posted on Jul 21, 2015 by Grumpy Bear

Yeah, ok :-) Great that you're investigating this!

Comment #12

Posted on Aug 24, 2015 by Grumpy Bear

Issues are moved to github, this one is now https://github.com/magro/memcached-session-manager/issues/267

Status: New

Labels:
Type-Defect Priority-Medium