|
Examples
Examples of how to use httplib2
Simple Retrievalimport httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://example.org/", "GET")The 'content' is the content retrieved from the URL. The content is already decompressed or unzipped if necessary. The 'resp' contains all the response headers. AuthenticationTo PUT some content to a server that uses SSL and Basic authentication: import httplib2
h = httplib2.Http(".cache")
h.add_credentials('name', 'password')
resp, content = h.request("https://example.org/chap/2",
"PUT", body="This is text",
headers={'content-type':'text/plain'} )Cache-ControlUse the Cache-Control: header to control how the caching operates. import httplib2
h = httplib2.Http(".cache")
resp, content = h.request("http://bitworking.org/")
...
resp, content = h.request("http://bitworking.org/",
headers={'cache-control':'no-cache'})The first request will be cached and since this is a request to bitworking.org it will be set to be cached for two hours, because that is how I have my server configured. Any subsequent GET to that URI will return the value from the on-disk cache and no request will be made to the server. You can use the Cache-Control: header to change the caches behavior and in this example the second request adds the Cache-Control: header with a value of 'no-cache' which tells the library that the cached copy must not be used when handling this request. FormsBelow is an example of using httplib2 to submit a form. Note that we have to use the urlencode() function from urllib to encode the data before using it as the POST body. >>> from httplib2 import Http
>>> from urllib import urlencode
>>> h = Http()
>>> data = dict(name="Joe", comment="A test comment")
>>> resp, content = h.request("http://bitworking.org/news/223/Meet-Ares", "POST", urlencode(data))
>>> resp
{'status': '200', 'transfer-encoding': 'chunked', 'vary': 'Accept-Encoding,User-Agent',
'server': 'Apache', 'connection': 'close', 'date': 'Tue, 31 Jul 2007 15:29:52 GMT',
'content-type': 'text/html'}CookiesWhen automating something, you often need to "login" to maintain some sort of session/state with the server. Sometimes this is achieved with form-based authentication and cookies. You post a form to the server, and it responds with a cookie in the incoming HTTP header. You need to pass this cookie back to the server in subsequent requests to maintain state or to keep a session alive. Here is an example of how to deal with cookies when doing your HTTP Post. First, lets import the modules we will use: import urllib import httplib2 Now, lets define the data we will need. In this case, we are doing a form post with 2 fields representing a username and a password. url = 'http://www.example.com/login'
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}Now we can send the HTTP request: http = httplib2.Http() response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body)) At this point, our "response" variable contains a dictionary of HTTP header fields that were returned by the server. If a cookie was returned, you would see a "set-cookie" field containing the cookie value. We want to take this value and put it into the outgoing HTTP header for our subsequent requests: headers['Cookie'] = response['set-cookie'] Now we can send a request using this header and it will contain the cookie, so the server can recognize us. So... here is the whole thing in a script. We login to a site and then make another request using the cookie we received: #!/usr/bin/env python
import urllib
import httplib2
http = httplib2.Http()
url = 'http://www.example.com/login'
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))
headers = {'Cookie': response['set-cookie']}
url = 'http://www.example.com/home'
response, content = http.request(url, 'GET', headers=headers)Proxieshttplib2 can use a SOCKS proxy if the third-party socks module is installed. Here is an example of how to use the proxy support: import httplib2
import socks
httplib2.debuglevel=4
h = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, 'localhost', 8000))
r,c = h.request("http://bitworking.org/news/")
|
Sign in to add a comment
Thank you for a great library.
Nice :) I'm loving it :P
Manually handling cookies seems a bit backward; it would be nice if there was a flag to tell an Http() instance to automatically store and repeat any cookies during the session.
The proxy example uses a module called "socks". I eventually tracked that down to here:
http://code.creativecommons.org/svnroot/stats/socks.py
The magic socks.PROXY_TYPE_HTTP constant is "3".
It looks like the home of socks.py is here: http://socksipy.sourceforge.net/
Note about timeouts, in case this helps anyone...
Each Http object you create maintains an open connection for every server you connect to (this is sensible as it reduces overhead for multiple requests).
However, it means that, when testing, it's no good pausing between requests - the open http connections will timeout (the remote server will close them). httplib2 makes no effort to create a new connection in this case - you'll just get an error message.
This is pretty confusing if you weren't expecting that behaviour, and it isn't obvious from the examples. So, create a new Http object whenever your old one might have timed out; don't re-use an old one that's been left for more than a few seconds.
great piece of code. i love the simplicity and the power. keep it up!
Compatibility problem with Python 2.6: httplib.HTTPConnection and httplib.HTTPSConnection now support a timeout parameter to their constructors, which they store in self.timeout. This conflicts with the httplib2.HTTPConnectionWithTimeout and httplib2.HTTPSConnectionWithTimeout use of self.timeout, resulting in the following error:
File "...httplib2\__init__.py", line 736, in connect sock.settimeout(self.timeout) File "<string>", line 1, in settimeout TypeError: a float is requiredTo rectify this, add the following code near the top of httplib2\init.py:
Then modify HTTPConnectionWithTimeout.init():
# Modified from original httplib2 source: HTTPConnection supports timeouts in Python 2.6. if python26OrHigher: httplib.HTTPConnection.init(self, host, port, strict, timeout=timeout) else: httplib.HTTPConnection.init(self, host, port, strict) self.timeout = timeoutHTTPConnectionWithTimeout.connect() change:
# Modified from original httplib2 source: HTTPConnection supports timeouts in Python 2.6. if not python26OrHigher: if self.timeout is not None: self.sock.settimeout(self.timeout)Similar for HTTPSConnectionWithTimeout.init():
# Modified from original httplib2 source: HTTPSConnection supports timeouts in Python 2.6. if python26OrHigher: httplib.HTTPSConnection.init(self, host, port=port, key_file=key_file, cert_file=cert_file, strict=strict, timeout=timeout) else: httplib.HTTPSConnection.init(self, host, port=port, key_file=key_file, cert_file=cert_file, strict=strict) self.timeout = timeoutAnd connect():
# Modified from original httplib2 source: HTTPSConnection supports timeouts in Python 2.6. if not python26OrHigher: if self.timeout is not None: sock.settimeout(self.timeout)has the above change made it into the repo?
Any estimate to when this is going to have the the changes above made? It's broken in 2.6 without it.
I'm starting work on a 0.5 release now and fixes for this will be incorporated.
I would really like to see an example of an HTTPS connection with certificate supplied over a SOCKS5 proxy.
h = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_SOCKS5, self.px_url, self.proxy_port)) h.add_certificate(self.certificate.ikeyfile, self.certificate.certfile, self.url) resp, content = h.request("https://"+self.url+":"+str(self.remote_port)+self.path+query)This gives me a (8, 'EOF occurred in violation of protocol') error.
Got all the imports..but am I going about this wrong? Or is it impossible..