Python - Example Of Urllib2 Asynchronous / Threaded Request Using Https

January 31, 2024 Post a Comment

I'm having a heck of a time getting asynchronous / threaded HTTPS requests to work using Python's urllib2. Does anyone out there have a basic example that implements urllib2.Reques

Solution 1:

The code below does 7 http requests asynchronously at the same time. It does not use threads, instead it uses asynchronous networking with the twisted library.

from twisted.web import client
from twisted.internet import reactor, defer

urls = [
 'http://www.python.org', 'http://stackoverflow.com', 'http://www.twistedmatrix.com', 'http://www.google.com','http://launchpad.net','http://github.com','http://bitbucket.org',
]

def finish(results):
    forresultin results:
        print 'GOT PAGE', len(result), 'bytes'
    reactor.stop()

waiting = [client.getPage(url) forurlin urls]
defer.gatherResults(waiting).addCallback(finish)

reactor.run()

Solution 2:

there's a really simple way, involving a handler for urllib2, which you can find here: http://pythonquirks.blogspot.co.uk/2009/12/asynchronous-http-request.html

#!/usr/bin/env pythonimport urllib2
import threading

classMyHandler(urllib2.HTTPHandler):
    defhttp_response(self, req, response):
        print"url: %s" % (response.geturl(),)
        print"info: %s" % (response.info(),)
        for l in response:
            print l
        return response

o = urllib2.build_opener(MyHandler())
t = threading.Thread(target=o.open, args=('http://www.google.com/',))
t.start()
print"I'm asynchronous!"

t.join()

print"I've ended!"

Solution 3:

here is an example using urllib2 (with https) and threads. Each thread cycles through a list of URL's and retrieves the resource.

import itertools
import urllib2
from threading import Thread


THREADS = 2
URLS = (
    'https://foo/bar',
    'https://foo/baz',
    )


defmain():
    for _ inrange(THREADS):
        t = Agent(URLS)
        t.start()


classAgent(Thread):
    def__init__(self, urls):
        Thread.__init__(self)
        self.urls = urls

    defrun(self):
        urls = itertools.cycle(self.urls)
        whileTrue:
            data = urllib2.urlopen(urls.next()).read()


if __name__ == '__main__':
    main()

Solution 4:

You can use asynchronous IO to do this.

requests + gevent = grequests

GRequests allows you to use Requests with Gevent to make asynchronous HTTP Requests easily.

import grequests

urls = [
    'http://www.heroku.com','http://tablib.org','http://httpbin.org','http://python-requests.org','http://kennethreitz.com'
]

rs = (grequests.get(u) foruin urls)
grequests.map(rs)

Solution 5:

here is the code from eventlet

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

deffetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print"got body", len(body)

Learn Python Programming