Export to GitHub

httplib2 - issue #229

python3 httplib2 clobbers multiple headers of same key


Posted on Oct 3, 2012 by Massive Rhino

Using httplib2 v 0.7.6.

What steps will reproduce the problem? 1. Have a server produce a response with multiple headers of the same name, for example:

Cache-control: max-age=3000
Cache-control: no-transform
  1. Request the url that produces that response with httplib2 and python3

What is the expected output? What do you see instead?

Expected output is response['cache-control'] == "max-age=3000, no-transform"

Output received is response['cache-control'] == "no-transform"

This is happening because of http.client (python3) having different header parsing behavior from httplib (python2). httplib does its own header parsing and appends the info from duplicate headers (method addheader in class HTTPMessage (httplib.py line 220 in python 2.7.2). Thus the handling in httplib2's Response class when calling info.getheaders() works out okay.

In python3 info.getheaders() will return multiples of the same key, and in httplib2 the last one will win.

Presumably the fix to ameliorate this problem is to check the Response dict for the key already being there, and append the new data, as done in httplib.

I expect I can cook up a reasonable patch for such things, but I first wanted to confirm that this is considered a bug in httplib2 and not in http.client and to farm for this important piece of information:

httplib blithely decides that all headers can have the append after ', ' thing happening, but I'm relatively certain that only a subset of response headers do that, notably Set-Cookie and Cache-Control. Is it best to guard for just those that are allowed? If so, which ones are? Building in a whitelist or blacklist seems fragile for the future.

Opinions?

Comment #1

Posted on Oct 8, 2012 by Massive Rhino

The reason this problem isn't showing up via test/duplicate-headers/multilink.asis is because, as far as I can tell, nginx is concatenating the Link headers before sending a response. A raw telnet to port 80 getting that returns:

Link: http://bitworking.org; rel="home"; title="BitWorking", http://bitworking.org/index.rss; rel="feed"; title="BitWorking"

So an additional test will be required, which I'm poking at now.

Comment #2

Posted on Oct 8, 2012 by Massive Rhino

Meh, tests need something like wsgi-intercept, see http://code.google.com/p/httplib2/issues/detail?id=84

In any case a diff with a simple fix without tests is attached. It is based on the code found in the python2.7 httplib.

Attachments

Comment #3

Posted on Nov 12, 2012 by Massive Hippo

Fixed in http://code.google.com/p/httplib2/source/detail?r=4dd0d6cc00c16caa00dbc5af29d1600aaf523c94

Status: Fixed

Labels:
Type-Defect Priority-Medium