| Issue 363: | urlfetch.fetch defective behavior on HEAD | |
| 19 people starred this issue and may be notified of changes. | Back to list |
Sign in to add a comment
|
Except the already posted fetch related bugs I think I have found a new one related to HEAD requests. The following is the output of the curl command: curl -I http://tinyurl.com/5lkxz8 HTTP/1.1 301 Moved Permanently X-Powered-By: PHP/5.2.5 Location: http://scobleizer.com/2008/05/17/celebrity-tipping-point-on-seesmic/ Content-type: text/html Date: Sat, 17 May 2008 19:27:02 GMT Server: TinyURL/1.6 and here is the output of the fetch() function: resp = urlfetch.fetch(url, method=urlfetch.HEAD) Checking URL: http://tinyurl.com/5lkxz8 Request to http://tinyurl.com/5lkxz8 return code 200 Headers {'content-length': '23352', 'vary': 'Cookie', 'server': 'nginx', 'connection': 'close', 'date': 'Sat, 17 May 2008 19:41:14 GMT', 'x-hacker': "If you're reading this, you should visit automattic.com/jobs and apply to join the fun, mention this header.", 'content-type': 'text/html; charset=UTF-8', 'x-pingback': 'http://scobleizer.com/xmlrpc.php'} Now, these are completely different from the curl return (and I have verified those are the correct ones). I find this quite a serious bug so hopefully most of the fetch() bugs will be fixed in the next release. |
||||||||||||
,
May 17, 2008
I've did some more investigation on this one and what it actually seems to be happening is that the fetch() function is in fact following redirects and this is in TOTAL contradiction with the 'spec': [quote] fetch() does not follow HTTP redirects. Instead, it returns the response object directly, with the status_code and headers set accordingly. [/quote] |
|||||||||||||
,
May 18, 2008
I have initially suspected that the problem would be originated on the httplib, but the following code shows
that it is working as expected:
> import httplib
> connection = httplib.HTTPConnection('tinyurl.com')
> connection.request('HEAD', '/5lkxz8')
> resp = connection.getresponse()
> data = resp.read()
> h = resp.getheaders()
> h
[('date', 'Sun, 18 May 2008 18:59:07 GMT'), ('server', 'TinyURL/1.6'), ('content-type', 'text/html'), ('location',
'http://scobleizer.com/2008/05/17/celebrity-tipping-point-on-seesmic/'), ('x-powered-by', 'PHP/5.2.6')]
So the problem is in GAE. I am using the 1.0.2 version on Mac with Python 2.5.2
|
|||||||||||||
,
May 19, 2008
The bug is in urlfetch_stub.py lines: 159-167
[code]
if http_response.status in REDIRECT_STATUSES:
url = http_response.getheader('Location', None)
if url is None:
error_msg = 'Redirecting response was missing "Location" header'
logging.error(error_msg)
raise apiproxy_errors.ApplicationError(
urlfetch_service_pb.URLFetchServiceError.FETCH_ERROR, error_msg)
else:
method = 'GET'
[/code]
As you can see in case of a redirect the method will continue to request the new url (found in the Location
header) and will be using a GET method instead of the initial method.
|
|||||||||||||
,
Jun 06, 2008
In response to comment 1 that fetch is not supposed to follow redirects: I found this in the latest fetch doc: "fetch() follows HTTP redirects up to 5 times, and returns the final resource." |
|||||||||||||
,
Jun 08, 2008
Now question is: how can you do a request that doesn't automatically follow the redirects. I think this bug report should become a feature request then. |
|||||||||||||
,
Jun 29, 2008
I have found that this defect was introduced when auto-redirect was added to
urlfetch. As noted above the documentation was updated to reflect the 5 redirect
retry.
The defect can be found in the SDK in urlfetch_stub.py (not sure if this is the code
actually used in GAE server, most likely only used in dev server.)
164 if http_response.status in REDIRECT_STATUSES:
165 url = http_response.getheader('Location', None)
166 if url is None:
167 error_msg = 'Redirecting response was missing "Location" header'
168 logging.error(error_msg)
169 raise apiproxy_errors.ApplicationError(
170 urlfetch_service_pb.URLFetchServiceError.FETCH_ERROR, error_ms g)
171 else:
172 method = 'GET'
(I also have the old version of urlfetch_stub.py, and see that the auto-redirect and
HEAD>>GET defect is not present.)
Needless to say I have starred this issue as well because side-effect seems to be
that HEAD request is overriden and GET request is used when not desired.
Believe this should be a simple patch to urlfetch_stub.py anyway: on line 164,
simply add additional condition to exclude HEAD request, something like the following:
164 if http_response.status in REDIRECT_STATUSES and request.method() !=
urlfetch_service_pb.URLFetchRequest.HEAD :
|
|||||||||||||
,
Jul 16, 2008
Another reference to issue #363 and issue #404 http://ajaxian.com/archives/endpoint-resolver-javascript-library-to-hunt-for-location-redirects |
|||||||||||||
,
Jul 24, 2008
I have posted a possible/temp solution for this issue: Issue 592 . Please let me know if it works for you. |
|||||||||||||
,
Jul 29, 2008
I'd like to be able to tell urlfetch NOT to follow redirects automatically. Suggested API: urlfetch.fetch(url="http://example.com/", follow_redirects=False) |
|||||||||||||
,
Aug 17, 2008
The current API follows redirects up to 5 times. However, the SDK and App Engine environments behave differently when it comes to the HTTP method, the SDK follows redirects with GET, but the App Engine production environment should preserve the original HTTP method. The SDK and App Engine environments should match. Not following redirects at all is a separate feature request.
Status: Accepted
|
|||||||||||||
,
Aug 20, 2008
ma... - Appreciate you all accepting issue #363 and issue #404 , and from the comments I know a few of us are eagerly awaiting their release. The main thing I think myself and others may need is to be able to get the Location header from a HEAD request that results in status 302/303. Agree from my tests that this issue is only in the SDK, but it highlights the importance of implementing the enhancement requested in issue #404 or reverting the behavior of redirecting automatically. Put it this way, if redirects were not automatic I could still redirect myself with a simple loop. But I cannot get the Location that a 302 status was pointing me to with the current behavior of urlfetch's HEAD method. Thanks much! |
|||||||||||||
,
Sep 15, 2008
We've added a follow_redirects option to the latest release of the SDK (1.1.3). Hopefully this resolves all of the cases discussed in this issue.
Status: Fixed
|
|||||||||||||
,
Sep 25, 2008
(No comment was entered for this change.)
Labels: log-1244937
|
|||||||||||||
,
Nov 20, 2008
Fixed in 1.1.6 |
|||||||||||||
|
|
|||||||||||||