How to Rotate Proxies in Python Using Requests (Easy Guide)

9 min read Original article ↗

Rotating Proxies with Python (the Traditional, More Complex Way)

Now, let’s look at a more traditional approach to proxy rotation in Python Requests. As you will soon see, this method is way more involved and requires a lot of time and care to keep it running smoothly.

Step 1. Setting up the Prerequisites of Python Request Proxies

Make sure you have Python installed on your system. You can use Python version 3.7 or higher for this tutorial. Go ahead and create a new directory where all the code for this project will be stored and create an app.py file within:

$ mkdir proxy_rotator
$ cd proxy_rotator
$ touch app.py

You also need to have requests installed. You can easily do that via PIP:

Step 2. How to Source a Proxy List?

Before you can rotate proxies, you need a list of proxies. There are different lists available online. Some of them are paid, and some are free. Each has its own pros and cons.

A very famous source of free proxies is Free Proxy List. The biggest issue with proxies from such free lists is that most of them might already be blocked by your target website, so you will have to do some testing to make sure the proxy you are using is unblocked.

You can download the proxy list from Free Proxy List into a txt file.

Note: If you choose the easy, ScraperAPI method outlined earlier in the article, you will be happy to learn that ScraperAPI automatically monitors all of its proxies to ensure they are not blocked by the target website!

Step 3. Making a Request Without a Proxy

Let’s start by taking a look at how to make a request using requests without any proxy. You can do so in two different ways. You can either directly use the requests.get (or similar) method, or you can create a Session and use that to make requests.

The direct requests using requests.get can be made like this:

import requests 
html = requests.get("https://yasoob.me")
print(html.status_code)
# output: 200

The same request using Session can be made like this:

import requests
s = requests.Session()
html = s.get("https://yasoob.me")
print(html.status_code)
# Output: 200

It is important to discuss both of these methods as the process of using a proxy is slightly different for each of them.

Step 4. Using a Proxy with Requests

It is very straightforward to use a proxy with requests. You just need to provide requests with a dictionary containing the HTTP and HTTPS keys and their corresponding proxy URL. You can use the same proxy URL for both of these protocols.

Note: As this article uses free proxies, the proxy URLs in the code blocks might stop working by the time you are reading them. You can follow along by replacing the proxy URLs in the code samples with working proxies from Free Proxy List.

Here is some sample code for using a proxy in requests without creating a Session object:

import requests

proxies = {
   'http': 'http://47.245.97.176:9000',
   'https': 'http://47.245.97.176:9000',
}

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.text)
# Output: {
#  "origin": "47.245.97.176"
# }

And here is the same example with the Session object:

import requests

proxies = {
   'http': 'http://47.245.97.176:9000',
   'https': 'http://47.245.97.176:9000',
}

s = requests.Session()
s.proxies = proxies
response = s.get('https://httpbin.org/ip')
print(response.text)
# Output: {
#  "origin": "47.245.97.176"
# }

It is common to get the CERTIFICATE_VERIFY_FAILED SSL error while using free proxies. Here is how the error will look like:

requests.exceptions.SSLError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /ip (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

You can get around this error by passing in verify=False to the get method like so:

requests.get('https://httpbin.org/ip', proxies=proxies, verify=False)

# or

s.get('https://httpbin.org/ip', verify=False)

Step 5. Using an Authenticated Proxy with Requests

It is equally straightforward to use authenticated proxies with requests. You just need to amend the proxies dictionary and provide the username and password for each proxy URL:

proxies = {
   'http': 'http://username:password@proxy.com:8080',
   'https': 'http://username:password@proxy.com:8081',
}

Replace username and password with working credentials and you are good to go. The rest of the code for making requests will stay as it is in the previous code samples.

Step 6. Setting a Proxy Via Environment Variables

You can also use proxies without adding any proxy-specific code to Python. This is possible by setting appropriate environment variables. requests honors the HTTP_PROXY and HTTPS_PROXY environment variables. If these are set, requests will use their corresponding value as the appropriate proxy URL.

You can set these environment variables in a Unix like system by opening up the terminal and entering this code:

export HTTP_PROXY='http://47.245.97.176:9000'
export HTTPS_PROXY='http://47.245.97.176:9000'

Now you can remove any proxy specific code from your Python program and it will automatically use the proxy endpoint set via these environment variables!

Try it out by running this code and make sure the output matches the proxy endpoint set via the environment variables:

import requests

response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.text)
# Output: {
#  "origin": "47.245.97.176"
# }

Step 7. Rotating Proxies with Each Request

As mentioned in the introduction, proxies can also get blocked. Therefore, it is important to rotate proxies and to try not to use a single proxy for multiple requests. Let’s take a look at how you can rotate proxies in Python using requests.

Step 7.1. Loading proxies from a proxy list

To get started, save the proxies from Free Proxy List into a proxy_list.txt file in the proxy_rotator directory. Here is what the file will look like:

196.20.125.157:8083
47.245.97.176:9000
54.39.132.131:80
183.91.3.22:11022
154.236.179.226:1981
41.65.46.178:1981
89.175.26.210:80
61.216.156.222:60808
115.144.99.220:11116
...
167.99.184.232:3128

Now open up the app.py file and write the following code to load these proxies into a list:

def load_proxy_list():
    with open("proxy_list.txt", "r") as f:
        proxy_list = f.read().strip().split()
    return proxy_list

Step 7.2. Verify the proxy works

Now that you have a list of proxies, it is important to test that all the proxies in the list are working and to get rid of the ones that are not working. You can test this by sending a request to httpbin via the proxy and making sure the response contains the proxy IP. If the request fails for some reason, you can discard the proxy.

You can make the discarding process more fine-grained by making sure the request failed due to an issue with the proxy and not because of an unrelated network issue. For now, let’s keep things simple and discard a proxy whenever there is any error (exception). Here is some code that does this:

def check_proxy(proxy_string):
    proxies = {
    'http': f'http://{proxy_string}',
    'https': f'http://{proxy_string}',
    }

    try:
        response = requests.get('https://httpbin.org/ip', proxies=proxies, timeout=30)
        if response.json()['origin'] == proxy_string.split(":")[0]:
            # Proxy works
            return True
        # Proxy doesn't work
        return False
    except Exception:
        return False

The code is fairly straightforward, you pass in a proxy string (eg, 0.0.0.0:8080) to check_proxy as an argument, and then check_proxy sends a request to httpbin.org/ip through the passed-in proxy. If the response contains the proxy ip in the response, it returns True and if it does not (or if the request fails) then it returns False. The code also has a timeout defined for each request. If the response is not received within the defined timeout, an exception will be raised. This will make sure you do not end up with slow proxies.

Step 7.3. Rotating the proxy with each request

You can now couple the functions in the previous two code listings and use them to rotate the proxy with each request. Here is one potential way of doing it:

from random import choice

def get_working_proxy():
    random_proxy = choice(proxy_list)
    while not is_proxy_working(random_proxy):
        proxy_list.remove(random_proxy)
        random_proxy = choice(proxy_list)
    return random_proxy

def load_url(url):
    proxy = get_working_proxy()
    proxies = {
        'http': f'http://{proxy}',
        'https': f'http://{proxy}',
    }
    response = requests.get(url, proxies=proxies)
    
    # parse the response
    # ...

    return response.status_code

urls_to_scrape = [
    "https://news.ycombinator.com/item?id=36580417",
    "https://news.ycombinator.com/item?id=36575784",
    "https://news.ycombinator.com/item?id=36577536",
    # ...
]
proxy_list = load_proxy_list()

for url in urls_to_scrape:
    print(load_url(url))

Let’s dissect this code a little. It contains a get_working_proxy() function that picks a random proxy from the proxy list, verifies that it works, and then returns it. If the proxy doesn’t work as expected, the function removes this proxy from the proxy list. Then there is the load_url() function. It gets a working proxy by calling the get_working_proxy() function and uses the returned proxy to route the request to the target URL. Finally, there is some code to start the scraping process. The important thing to note here is that a random proxy is used for each request, and this helps spread the scraping load over multiple proxies.

How to Improve the Proxy Rotator

There are many ways to improve the naive proxy rotator you have created so far. The first is to revise the exception handling code and ensure that the proxy is discarded only if it is faulty.

Another way is to recheck discarded proxies after a while. Generally, free proxies cycle between working and not working states far too often. You can also add logic to load the proxies directly from the Free Proxy List website rather than saving them manually to a txt file first.

Maximize Web Scraping Success with Effective Proxy Use and Rotation

You’ve learned how to use proxies with Requests in Python, source, verify, and rotate them. Now — you must be wondering about the best method to use with proxies.

You can choose a more traditional way, but you should be prepared to tweak the code more often and constantly have an eye on updating proxies. It can get too time-consuming at some point and break your flow of data collection. The best way is to use a tool that does proxy rotation for you so you can get the data you need quickly and on a large scale.

Try ScraperAPI and get 5,000 free credits when you sign up!