|
Examples
HTTPConnectionPool code examples
Featured The Basicsfrom urllib3 import HTTPConnectionPool
# Create a connection pool for a specific host
http_pool = HTTPConnectionPool('google.com')
# GET
r = http_pool.urlopen('GET', '/', redirect=False)
print r.status, r.headers.get('location')
# -> 301 http://www.google.com/
r = http_pool.urlopen('GET', '/', redirect=True) # redirect=True is default
print r.status, len(r.data)
# -> 200 5814Convenient shortcutsimport urllib3
# Create a connection pool from a url (instead of host)
http_pool = urllib3.connection_from_url('http://ajax.googleapis.com:80/ajax/services/search/web')
print http_pool.host, http_pool.port
# -> ajax.googleapis.com 80
# GET
r = http_pool.get_url('/ajax/services/search/web')
print r.status, r.length, len(r.data)
# -> 200 83
# GET with url parameters
fields = {'v': '1.0', 'q': 'urllib3'}
r = http_pool.get_url('/ajax/services/search/web', fields)
print r.status, len(r.data)
# -> 200 3251
# POST with file upload
fields = {'v': '1.0', 'file_field': ('filename.txt', 'contents of the file')}
r = http_pool.post_url('/ajax/services/search/web', fields)
# Obviously the Google search API is not expecting a file upload so nothing useful
# will happen, but this is how POST upload files.
print r.status, len(r.data)
# -> 200 81Writing safe codeThere's a few exceptions that can be raised:
All exceptions inherit from HTTPError. from urllib3 import connection_from_url, HTTPError, TimeoutError, MaxRetryError
http_pool = connection_from_url('http://somelaggyhost.com/', timeout=1.0)
try:
r = http_pool.get_url('/', retries=2)
except TimeoutError, e:
# ...
except MaxRetryError, e:
# ...Or you can catch all possible errors at once, # ...
try:
r = http_pool.get_url('/', retries=2)
except HTTPError, e:
# Something bad happened, handle it...
|
► Sign in to add a comment
def init(self, host, port=None, timeout=None, maxsize=10):
In [6]: http_pool = HTTPConnectionPool('google.com') --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Users\Kan\<ipython console> in <module>() C:\Python25\lib\site-packages\urllib3-0.2-py2.5.egg\urllib3\connectionpool.pyc i n __init__(self, host, port, timeout, maxsize) 81 self.pool = Queue(maxsize) 82 self.host = host ---> 83 self.port = int(port) 84 self.timeout = timeout 85 self.num_connections = count() TypeError: int() argument must be a string or a number, not 'NoneType'I change to :
In [7]: http_pool = HTTPConnectionPool('google.com', port='80') In [8]:in example :
have change to :
There should probably be a default logger registered:
isn't a great error message =)
Good idea, I should do another release soon. Anyone interested in joining and helping maintain?
Most problems were fixed in the new release (not including the logger warning, not sure if that's a good idea considering it's supposed to be used in conjunction with other codebases... hrm.)
Also, now with HTTPS support, wee!
How do you integrate this with httplib2?
how do you get the last url accessed using urllib3? e.g. using urllib2
So for crawling over multiple hosts, I'd want a connection pool for each host. I would then probably want to have a "pool of pools" that is a mapping between hostnames and pool objects, with some sort of automatic expiring of pools that haven't been used in a while. Are there any plans to roll this functionality into urllib3?
Also, while urllib3 has the advantage of thread safety (and maybe pipelining, not sure) over httplib2, httplib2 has the advantage of caching. Are there any plans to implement caching, or somehow merge these two projects together in some fashion?
If you implement the pool of pools, I nominate that it be called HTTPOcean
in my code i tried modifying the values of redirect (e.g. no value, true, false):
# Make the request and capture the response try: request = self.vt_http.post_url('url_here', post_data, redirect=False) except urllib3.HTTPError, e: print "Error: " + e.strerroralso added this line inside connectionpool:
print "URL1 %s\n" % response.headers.get('location') # Handle redirection if redirect and response.status in [301, 302, 303, 307] and 'location' in response.headers: # Redirect, retry print "URL2 %s\n" % response.headers.get('location') log.info("Redirecting %s -> %s" % (url, response.headers.get('location'))) return self.urlopen(method, response.headers.get('location'), body, headers, retries-1, redirect) return responsein my code if redirect is False, both URL1 and URL2 are not getting the correct URL if redirect is True, both URL1 and URL2 gets the correct URL but since redirect is true, another self.urlopen(...) will be called and follows the redirect
im a newbie to this, but what i need is redirect=False, get url referrer and don't follow redirection... so that in my code, i can use the request.headers.get('location') and get the referrer...
hope this makes sense...
It seems it can not deal with utf-8 web sites.
Sorry, I made a mistake, it CAN deal with utf-8 web site. It just can not deal with gziped web page. here is my test code:
#coding=UTF-8 import urllib3 import httplib import sys reload(sys) sys.setdefaultencoding('utf-8') httplib.debuglevel = 1 #by the way, this line not works as I expected API_URL = 'http://www.baidu.com/' headers = { 'User-Agent':'Baiduspider+(+http://www.baidu.com/search/spider.htm)', 'Referer':API_URL, 'Accept-Encoding':'gzip,deflate' #<-- This line makes it display wrong output } http_pool = urllib3.connection_from_url(API_URL) r = http_pool.get_url(API_URL, headers=headers) print r.status, r.dataWith python gzip module, we can solve this problem easily.
Hi all, sorry for lack of replies, wish Google would email me with notifications somehow.
@wolf550e There shouldn't be a need to integrate with httplib2, they provide a lot of overlapping functionality and don't really complement each other well.
@mackstann Agreed, I welcome outside contributions if you want to give it a go. :-) I was thinking of just calling it HTTPPool instead of HTTPConnectionPool, or maybe HTTPBucket, or ConnectionBucket??.
Also, I am not opposed to collaborating with httplib2, though I think some of the approaches are philosophically different. Added support for caching should be pretty easy to augment into urllib3.
@rapsys_eacit Hmm, you should get what you're describing by doing redirect=False, there's no reason why the urls should be different (unless the url you're trying to open simply doesn't have a location header). Can you provide a specific test case?
@freewind Ah good point. Any interest in making a patch for gzip support? :-)
gzip support doesn't seem to be in the standard Python url-ish modules either. Are you suggesting to automatically decompress gzip'd data?
Hi, want to POST a .sff (structured fax file) to our Fax server but the code is not working right. Instead of sending the file to the server it sends the "contents of file" text to the server. So i always get the Fax with "contents of file", not the document i want to send
Sorry for the lack of replies, I don't get notifications when somebody posts here. Please email me or open a ticket or post on StackOverflow? or somesuch. :)
@sstein... gzip support should work natively in the latest version.
@gob... 'contents of file' should be the file body, that is something like: 'Filename': ('fax.sff', open('fax.sff').read()), ...