Skip to content Skip to sidebar Skip to footer

Bypass Rate Limit For Requests.get

I want to constantly scrape a website - once every 3-5 seconds with requests.get('http://www.example.com', headers=headers2, timeout=35).json() But the example website has a rate

Solution 1:

You would have to do some very low level stuff. Utilizing likely socket and urllib2. First do your research. How are they limiting your query rate? Is it by IP, or session based (server side cookie) or local cookies? I suggest going to the site manually as your first step of research, and using a web-developer tool to view all headers communicated.

One you figure this out, create a plan to manipulate it. Lets say it is session based, you could utilize multiple threads to control several individual instances of a scraper, each with unique sessions.

Now, if it is IP based, then you must spoof your IP which is much more complex.

Solution 2:

just buy quite a lot of proxy. and config the script to change the proxy to next after the rate limit time of the server.

Post a Comment for "Bypass Rate Limit For Requests.get"