
google-app-engine-django - issue #169
Datastore timeout / random exception on first request can cause double imports and raise errors like Resolver404
What steps will reproduce the problem? 1. Run your app until a random exception occurs on the first request to a new instance of your app 2. Continue to make requests.
What is the expected output? What do you see instead?
It's very likely that you will see an exception that doesn't make any sense, like a Resolver404 within a try/except block that would otherwise get caught.
What version of the product are you using? On what operating system?
r201 Django 1.1.1
Please provide any additional information below.
The problem is that Google App Engine Django will install the Django module twice into memory when an exception like this occurs. There is a patch attached here that ensures Django is only imported once.
Here is a more elaborate explanation:
When an exception is raised on the first request of an application on Google App Engine, the main.py script is aborted and is not loaded into memory. On the second request, main.py is reloaded even though previous imported modules are still held in memory. If the exception was random, like a Datastore timeout, then it might go away on the second request but at that point your app is left in volatile state because Django was removed from sys.modules and re-imported. Due to the way referencing works in Python, it is possible that there might be two instances of the Django module in memory.
For an example of a real traceback see this ticket in a live app: http://code.google.com/p/google-app-engine-django/issues/detail?id=155 Each exception is a little different but here another traceback I saw point to code in Django like this:
try: urlresolvers.resolve(path) return True except urlresolvers.Resolver404: return False
in this block of code the exception Resolver404 was getting raised even though it appeared to be getting caught. This is because the Django module had been reloaded and two classes named Resolver404 were in the running program.
In addition to the patch for the fix, I have attached gae-django-reproduce-double-bug.zip is a complete app that reproduces the bug by simulating a chain of similar exceptions. if you run this app, you will see :
1st request: log message that App Engine Django is booting up Resolver404 exception that is properly caught and ignored a ValueError to simulate a random exception
2nd request: log message that App Engine Django is booting up a Resolver404 exception that is mysteriously not caught
If App Engine Django was not booted up the second time then the error would be fixed.
The attached patch might not be the most elegant approach but this should give you some ideas for how to fix the problem. My app at http://code.google.com/p/chirpradio/ would see a random Resolver404 error at least once a day and after running with the patch, the error has not appeared in a full week.
See also this bug report in Google App Engine which has more details about the problem: http://code.google.com/p/googleappengine/issues/detail?id=1409
Comment #1
Posted on Jun 4, 2010 by Swift KangarooPossibly issue 1409 is also relevant -- the problem there is caused by app-engine- patch which attempts to delete the imported django modules. This is not safe.
Comment #2
Posted on Jun 4, 2010 by Swift KangarooSorry, that would be http://code.google.com/p/googleappengine/issues/detail?id=1409 -- I forgot this is a different tracker. (And the relevance is likely that attempts to delete django from sys.modules are always unsafe.)
Comment #3
Posted on Jun 4, 2010 by Happy BearThanks for pointing 1409 out, Guido.
For the record, I'd rather see this patch use getattr instead of hasattr to check the value like in this changeset: http://code.google.com/p/dherbst-app-engine-django/source/detail? r=cad9d77a103622c77d894a6b7b52ee2b2ff70fa7 if it is committed so there is no ambiguity of whether the imports have successfully completed.
Comment #4
Posted on Jun 4, 2010 by Swift Oxdbherbst: yep, I like your patch. Is there anyone who can review and apply this?
Comment #5
Posted on Jun 8, 2010 by Grumpy PandaI can review and apply, please ensure you've signed the CLA as described in the readme file.
Would this patch and issue still be relevant if the helpers import logic for Django was changed to only support use_library ?
If we can address the underlying cause of the failures by switching to use_library, rather than simply papering over the symptoms then that would be preferable.
If you can demonstrate that we still need this logic even in the presence of use_library then I'm happy to accept the patch, although I'd prefer if all the logic to determine whether or not the helper has already been initialised would be contained within the InstallAppengineHelperForDjango method - with task queues and cron jobs main.py is often not the only entry point into the application, so the helper is invoked from multiple places. We don't want to have to duplicate this logic in all those files.
Comment #6
Posted on Jun 8, 2010 by Grumpy PandaIssue 155 has been merged into this issue.
Comment #7
Posted on Jun 8, 2010 by Happy BearI signed the CLA in 2009. I haven't seen this issue with use_library and I do not use a django zip file so I can't comment. Perhaps Kumar can comment on this.
Comment #8
Posted on Jun 8, 2010 by Swift OxThe attached zip file, gae-django-reproduce-double-bug.zip, is a complete app that reproduces the bug when you upload it to an App Engine server. When I get some free time I'll try applying use_library to that code but if someone beats me to it then they should be able to prove the theory and get an answer sooner. I don't know how use_libary works but if it does not involve deleting django from sys.modules then it should solve the issue just fine. Thanks for taking a look m...@google!
Comment #9
Posted on Jul 30, 2010 by Swift OxHi all. Sorry for the delayed response. I'm proposing a new patch, attached here and explained below.
The first thing I did was fire up the example app that was attached to this ticket to reproduce the error. The example app still reproduced the error but note it only exists on App Engine itself, not when running the app from the local SDK.
In response to m...@google.com's comment about converting to use_django() I am now confused by this. I see that use_django() is already the method employed by the code, as you can see in the LoadDjango() method of appengine_django/init.py
So instead this new patch takes the approach that m...@google.com suggested which is to memoize InstallAppengineHelperForDjango() itself (not main.py) so it only runs once per process. This will apply the fix for other entry points (cron, task queue, etc).
The patch fixes the repeated exception that can be seen in the example app and I also verified that all tests pass. Can someone review this and perhaps get it applied to trunk? I have already signed the CLA.
thanks! -Kumar
Comment #10
Posted on Aug 21, 2010 by Swift OxDoes any maintainer have a minute to review the latest patch? Thanks, Kumar
Comment #11
Posted on Oct 25, 2010 by Swift OxHi. Once again, could someone take a look and perhaps apply this patch? It might help some people out. I read this article about someone who ran into this and gave up, switching to Rackspace http://www.agmweb.ca/blog/andy/2286/ i think my fix here should solve these odd deadline error states.
Comment #12
Posted on Oct 25, 2010 by Grumpy Hippo+1 to kumar's request. I've applied the patch he provided in comment 9 and it completely solved a whole class of errors that had dumbfounded me in a large, commercial AppEngine app. Including this patch in the official release would be a huge help, especially because it's very hard from the error messages to infer that this error is related to google-app-engine-django.
Comment #13
Posted on Oct 26, 2010 by Grumpy PandaThis issue was closed by revision r107.
Comment #14
Posted on Oct 26, 2010 by Grumpy PandaApologies for the delay merging the patch. Thanks very much.
Comment #15
Posted on Oct 26, 2010 by Swift Oxthanks m...!
Status: Fixed
Labels:
Type-Defect
Priority-Medium