A bit about the setup I'm using: 1 haproxy as a load balancer' 2 tomcat6 nodes; 2 memcached nodes; Non-sticky mode. Kryo serialization strategy; Operation and sessionBackup timeouts are default; Locking startegy: auto.
What steps will reproduce the problem? 1. Start all the nodes 2. Login into application 3. Check that session backup is saved in the secondary memcached nodes 4. Shutdown the primary memcached node 5. Navigate to some other page in application (sometimes I get dropped to log in screen with new session identifier not sure why this is happening but it's possibly caused by timeouts) 6. Restore the memcached node (it takes a while for tomcat to detect that node is back to up state and store the backup of session into it. I'm looking for the options to change this timeout) 7. As the session backup process is triggered by user requests, in this step I'm making some interactions with the application until the session is stored as backup. 8. Kill other node (which is now primary) 9. Next interation with application will get me into the login screen (session information lost), but if I'll change the session id to the session that had to be restored then I will be able to use application with that session identifier).
Basically it's quite interesting situation and currently I'm not sure what causing this behaviour as I can't stabily reproduce this issue. Any suggestions will be appreciated.
Comment #1
Posted on Jul 1, 2015 by Happy HorseI've investigated this issue a bit more. So the session is lost when there are concurrent requests and one of them is matched by requestIgnorePattern. As far as I understand there's a racing condition which request will get served first. In case it will be the ignored request the session will be lost as backup retrival will not be triggered. When this parameter is omitted in context.xml failover is working as expected but in my case we have a lot of heavy js pages and each request to such page will be generating 30-50 requests to each memcached nodes to update the metadata of the session stored there. So disabling it is not an option.
Comment #2
Posted on Jul 2, 2015 by Grumpy BearGreat that you investigated this more!
So the session is lost when there are concurrent requests and one of them is matched by requestIgnorePattern.
Are you referring to a request that should not match the requestIgnorePattern? Is the pattern incorrect / too broad then?
As far as I understand there's a racing condition which request will get served first
If the browser sends parallel requests (e.g. via ajax), then there's indeed no guarantee which one hits the server first. This would have to be handled on the client side, the server can nothing do about this.
In case it will be the ignored request the session will be lost as backup retrival will not be triggered.
But a request after the ignored one then should trigger backup retrieval, doesn't this happen then?
Are the "heavy js pages" somehow related to the session, or are this just "stateless" resources?
Comment #3
Posted on Jul 2, 2015 by Happy HorseAbout the requestIgnorePattern: pattern matches the png file in my case. Basically I'm trying the following scenario: 1. Login, both memcached nodes up and session is backed up correctly 2. Kill primary node 3. When I'm selecting the menus - png request is sent to backend (css background). Right after I'm clicking the link and calling the controller.
In case if the png request is served first request tracking host valve is not performing the check of the primary node status, session is not recovered from the backup. After it I'm getting new session id which is not contained in any of memcached nodes and following request (controller) is served with this new session id so application is redirecting to log in screen. Currently I'm not sure how this is happening but disabling requestIgnorePattern fixes this issue. This possibly can have something with the spring security session fixation protection or other similiar stuff.
In case controller gets served in first place then failover is working as expected.
Under the heavy js pages I mean that they are requesting a lot of js files while they are loading. These requests don't change session information in any way.
Comment #4
Posted on Jul 3, 2015 by Happy HorseComment deleted
Comment #5
Posted on Jul 3, 2015 by Happy HorseI've tried to reproduce this issue on the msm sample app that is hosted on github. The fail-over is working as expected there with the same configuration and same tomcat instance. As there were no resources like png, ico etc. I've added one but it was still working as expected.
Also I've tried to make a fix for this behavior by adding the primary memcached node availability check in RequestTrackingHostValve where ignorePattern is evaluated. As far as I can tell this fix works and failover is working as expected in my application.
Comment #6
Posted on Jul 3, 2015 by Grumpy BearCan you submit a pull request with your change?
Comment #7
Posted on Jul 6, 2015 by Happy HorseSubmitted the pull request with possible fix: https://github.com/magro/memcached-session-manager/pull/44
Comment #8
Posted on Jul 14, 2015 by Happy HorseDid you have time to look into it by chance?
Comment #9
Posted on Jul 17, 2015 by Grumpy BearSorry for the late response, business work took all the time... I had a look at your PR - AFAICS in the case of primary node unavailability requests that otherwise should be ignored then a NOT ignored but go through standard session handling.
I tend to think that while this may solve the specific issue, it's still just a workaround and there is a different root cause.
I'd say that requests that should be ignored should completely bypass session handling, so they should not depend on memcached availability at all. If such requests cause issues this is probably not the case. I'd prefer to find and fix this issue.
What do you think?
Comment #10
Posted on Jul 19, 2015 by Massive CamelThis "fix" was made just to show what I mean and more like a treatment of the symptom then the cause. It's definitely not a solution for the problem. Also I was not able to reproduce this issue with the test app (wicket). So I guess I'll invest a bit more time into investigation of this issue until it will be clear what causing it. Just had a little hope that you'll "magically" find the problem =).
Comment #11
Posted on Jul 21, 2015 by Grumpy BearYeah, ok :-) Great that you're investigating this!
Comment #12
Posted on Aug 24, 2015 by Grumpy BearIssues are moved to github, this one is now https://github.com/magro/memcached-session-manager/issues/267
Status: New
Labels:
Type-Defect
Priority-Medium