My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Examples  
HTTPConnectionPool code examples
Featured
Updated Dec 10, 2009 by shazow

The Basics

from urllib3 import HTTPConnectionPool

# Create a connection pool for a specific host
http_pool = HTTPConnectionPool('google.com')

# GET 
r = http_pool.urlopen('GET', '/', redirect=False)

print r.status, r.headers.get('location')
# -> 301 http://www.google.com/

r = http_pool.urlopen('GET', '/', redirect=True) # redirect=True is default

print r.status, len(r.data)
# -> 200 5814

Convenient shortcuts

import urllib3

# Create a connection pool from a url (instead of host)
http_pool = urllib3.connection_from_url('http://ajax.googleapis.com:80/ajax/services/search/web')

print http_pool.host, http_pool.port
# -> ajax.googleapis.com 80
 
# GET
r = http_pool.get_url('/ajax/services/search/web')

print r.status, r.length, len(r.data)
# -> 200 83

# GET with url parameters
fields = {'v': '1.0', 'q': 'urllib3'}
r = http_pool.get_url('/ajax/services/search/web', fields)

print r.status, len(r.data)
# -> 200 3251

# POST with file upload
fields = {'v': '1.0', 'file_field': ('filename.txt', 'contents of the file')}
r = http_pool.post_url('/ajax/services/search/web', fields)

# Obviously the Google search API is not expecting a file upload so nothing useful
# will happen, but this is how POST upload files.

print r.status, len(r.data)
# -> 200 81

Writing safe code

There's a few exceptions that can be raised:

  • TimeoutError when the socket times out (the time is set using the timeout parameter in the constructor).
  • MaxRetryError when the number of retries (and redirects) exceeds the number defined by the retries parameter of a call.
  • HostChangedError when you create a connection on one host (e.g. google.com) and make a request on it to another host (e.g. http://yahoo.com/foo). This means you'll need a fresh socket, so you can't re-use a connection to do that.

All exceptions inherit from HTTPError.

from urllib3 import connection_from_url, HTTPError, TimeoutError, MaxRetryError

http_pool = connection_from_url('http://somelaggyhost.com/', timeout=1.0)
try:
    r = http_pool.get_url('/', retries=2)
except TimeoutError, e:
    # ...
except MaxRetryError, e:
    # ...

Or you can catch all possible errors at once,

# ...

try:
    r = http_pool.get_url('/', retries=2)
except HTTPError, e:
    # Something bad happened, handle it...
Comment by fengmk2, Dec 13, 2008

def init(self, host, port=None, timeout=None, maxsize=10):

self.pool = Queue(maxsize) self.host = host self.port = int(port) # here if port is None, will raise an exception. self.timeout = timeout self.num_connections = count() self.num_requests = count()

Comment by kan.swat...@gmail.com, Jul 2, 2009
In [6]: http_pool = HTTPConnectionPool('google.com')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

C:\Users\Kan\<ipython console> in <module>()

C:\Python25\lib\site-packages\urllib3-0.2-py2.5.egg\urllib3\connectionpool.pyc i
n __init__(self, host, port, timeout, maxsize)
     81         self.pool = Queue(maxsize)
     82         self.host = host
---> 83         self.port = int(port)
     84         self.timeout = timeout
     85         self.num_connections = count()

TypeError: int() argument must be a string or a number, not 'NoneType'

I change to :

In [7]: http_pool = HTTPConnectionPool('google.com', port='80')
In [8]:
Comment by kan.swat...@gmail.com, Jul 2, 2009

in example :

In [25]: print r.status, r.length, len(r.data)
200---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)

C:\Users\Kan\<ipython console> in <module>()

AttributeError: 'HTTPResponse' object has no attribute 'length'

have change to :

In [26]: print r.status, len(r.data)
200 83
Comment by david.as...@gmail.com, Nov 30, 2009

There should probably be a default logger registered:

No handlers could be found for logger "urllib3.connectionpool"

isn't a great error message =)

Comment by project member shazow, Dec 4, 2009

Good idea, I should do another release soon. Anyone interested in joining and helping maintain?

Comment by project member shazow, Dec 10, 2009

Most problems were fixed in the new release (not including the logger warning, not sure if that's a good idea considering it's supposed to be used in conjunction with other codebases... hrm.)

Also, now with HTTPS support, wee!

Comment by wolf5...@gmail.com, Dec 15, 2009

How do you integrate this with httplib2?

Comment by rapsys_e...@yahoo.com, Dec 15, 2009

how do you get the last url accessed using urllib3? e.g. using urllib2

encoded_data = urllib.urlencode(post_data)
request = urllib2.Request(url, encoded_data)
# Make the request and capture the response
try:
   response = urllib2.urlopen(request)
except urllib2.URLError, e:
   print "Error: " + e.strerror
   return False

result = response.geturl()
Comment by mackst...@gmail.com, Dec 15, 2009

So for crawling over multiple hosts, I'd want a connection pool for each host. I would then probably want to have a "pool of pools" that is a mapping between hostnames and pool objects, with some sort of automatic expiring of pools that haven't been used in a while. Are there any plans to roll this functionality into urllib3?

Also, while urllib3 has the advantage of thread safety (and maybe pipelining, not sure) over httplib2, httplib2 has the advantage of caching. Are there any plans to implement caching, or somehow merge these two projects together in some fashion?

Comment by mackst...@gmail.com, Dec 15, 2009

If you implement the pool of pools, I nominate that it be called HTTPOcean

Comment by rapsys_e...@yahoo.com, Dec 15, 2009

in my code i tried modifying the values of redirect (e.g. no value, true, false):

# Make the request and capture the response
try:
   request = self.vt_http.post_url('url_here', post_data, redirect=False)
except urllib3.HTTPError, e:
   print "Error: " + e.strerror

also added this line inside connectionpool:

print "URL1 %s\n" % response.headers.get('location')
# Handle redirection
if redirect and response.status in [301, 302, 303, 307] and 'location' in response.headers: # Redirect, retry
   print "URL2 %s\n" % response.headers.get('location')            
   log.info("Redirecting %s -> %s" % (url, response.headers.get('location')))
   return self.urlopen(method, response.headers.get('location'), body, headers, retries-1, redirect)       
return response

in my code if redirect is False, both URL1 and URL2 are not getting the correct URL if redirect is True, both URL1 and URL2 gets the correct URL but since redirect is true, another self.urlopen(...) will be called and follows the redirect

im a newbie to this, but what i need is redirect=False, get url referrer and don't follow redirection... so that in my code, i can use the request.headers.get('location') and get the referrer...

hope this makes sense...

Comment by freew...@gmail.com, Jan 19, 2010

It seems it can not deal with utf-8 web sites.

Comment by freew...@gmail.com, Jan 19, 2010

Sorry, I made a mistake, it CAN deal with utf-8 web site. It just can not deal with gziped web page. here is my test code:

#coding=UTF-8

import urllib3
import httplib
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

httplib.debuglevel = 1  #by the way, this line not works as I expected

API_URL = 'http://www.baidu.com/'
headers = {
  'User-Agent':'Baiduspider+(+http://www.baidu.com/search/spider.htm)',
  'Referer':API_URL,
  'Accept-Encoding':'gzip,deflate'  #<-- This line makes it display wrong output
}

http_pool = urllib3.connection_from_url(API_URL)
r = http_pool.get_url(API_URL, headers=headers)
print r.status, r.data

With python gzip module, we can solve this problem easily.

Comment by project member shazow, Feb 18, 2010

Hi all, sorry for lack of replies, wish Google would email me with notifications somehow.

@wolf550e There shouldn't be a need to integrate with httplib2, they provide a lot of overlapping functionality and don't really complement each other well.

@mackstann Agreed, I welcome outside contributions if you want to give it a go. :-) I was thinking of just calling it HTTPPool instead of HTTPConnectionPool, or maybe HTTPBucket, or ConnectionBucket??.

Also, I am not opposed to collaborating with httplib2, though I think some of the approaches are philosophically different. Added support for caching should be pretty easy to augment into urllib3.

@rapsys_eacit Hmm, you should get what you're describing by doing redirect=False, there's no reason why the urls should be different (unless the url you're trying to open simply doesn't have a location header). Can you provide a specific test case?

@freewind Ah good point. Any interest in making a patch for gzip support? :-)

Comment by sstein...@gmail.com, Mar 26, 2010

gzip support doesn't seem to be in the standard Python url-ish modules either. Are you suggesting to automatically decompress gzip'd data?

Comment by gob...@gmail.com, Jul 13, 2011

Hi, want to POST a .sff (structured fax file) to our Fax server but the code is not working right. Instead of sending the file to the server it sends the "contents of file" text to the server. So i always get the Fax with "contents of file", not the document i want to send

def onSend(self):
http_pool = urllib3.connection_from_url('http://www.someserver.at:80/fs_bvoip')
fields = {
'Username' : '123456789', 'Password' : 'test123', 'Number' : '0123456', 'Email' : 'schwendi@someserver.org', 'Headline' : 'Test -Linux', 'StationID' : 'test - Test', 'Retries' : '3', 'Filename' : ('fax.sff', 'contents of file'), 'Mode' : '1' }
r = http_pool.post_url('/fs_bvoip', fields)
print r.status, r.data
Comment by project member shazow, Sep 26, 2011

Sorry for the lack of replies, I don't get notifications when somebody posts here. Please email me or open a ticket or post on StackOverflow? or somesuch. :)

@sstein... gzip support should work natively in the latest version.

@gob... 'contents of file' should be the file body, that is something like: 'Filename': ('fax.sff', open('fax.sff').read()), ...


Sign in to add a comment
Powered by Google Project Hosting