How to customize Your User-Agent with Python Requests

In this guide, we will see how we can update the user-agent using the Python requests library.

When web scraping in Python with Requests, you may face limits due to anti-bot protections.

The anti-bot software, basic or advanced, usually first checks the User-Agent.

It's not a big deal. Customizing the User-Agent is simple, with the right tools and knowledge.

How to update the user agent with Requests

The requests functions do not have a user_agent keyword argument to set a specific user agent in the Python requests module.

You just have to set the header user_agent with the wanted user agent.

Here is a straightforward example:

Python

    import requests

user_agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0"

response = requests.get("http://httpbin.org/headers", headers={"User-Agent": user_agent})

print(response.text)

Here is the response:

Bash

    {
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0", 
    "X-Amzn-Trace-Id": "Root=1-66f56a85-5ba5a77d3b72863913e449e0"
  }
}

Pretty simple, isn’t it?

What is a User Agent?

A User Agent (UA) is a string sent in HTTP requests. It helps the target website identify the requesting device.

The user agent is a standard HTTP header among other HTTP headers. It provides details about the client software, which can be a browser, a mobile app, or other web client.

For example, the user agent of my browser looks like this:

Bash

    Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36

Mozilla/5.0: is used as a compatibility flag (many browsers use this to avoid being blocked by websites designed for Mozilla browsers).
(X11; Linux x86_64): indicates the client is running on a 64-bit Linux operating system.
AppleWebKit/537.36 (KHTML, like Gecko): indicates that this browser uses the WebKit rendering engine (often associated with Chrome or Safari browsers).
Chrome/92.0.4515.107 i: identifies the browser as Chrome, version 92.
Safari/537.36: indicates that the browser is compatible with Safari's web-rendering engine, which is used for cross-browser compatibility.

Here are other examples of different user agents:

iPhone user agent example: Mozilla/5.0 (iPhone; CPU iPhone OS 17_7 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.0 Mobile/15E148 Safari/604.1
Android user agent example: Mozilla/5.0 (Linux; Android 15; SM-G960U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.6668.69 Mobile Safari/537.36
Postman: PostmanRuntime/7.42.0
Google bot user agent example: crawl-66-249-66-1.googlebot.com

As you can see, user agents don’t always follow a specific structure.

Python Requests’ default User-Agent

Let’s make a simple HTTP request to see what headers we have by default in Python Requests:

Python

    import requests

response = requests.get("http://httpbin.org/headers")
print(response.text)

And here is the output:

Bash

    {
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.32.3", 
    "X-Amzn-Trace-Id": "Root=1-66f6e8a3-14197897145d587e12382c0c"
  }
}

We can see that the default header is python-requests/2.32.3 which is the installed version.

Why you shouldn’t use the default user-agent

As we discussed, requests without customized headers are obvious. Taking a look at the User Agent is the first thing a website or anti-bot protection does.

Of course, this is not enough for the website to protect itself against bots, but it’s the easiest first step.

As a web scraper, I always customize my User-Agent, even for small, insecure sites.

Sometimes I use one of the latest chrome user agents, or sometimes my own browser’s user-agent.

Set a User-Agent using Session object from Requests library

Some anti-bot protections set a valid cookie to the request if it finds nothing suspicious, but calling the requests.get(...) method prevents you from persisting cookies.

In some situations, it’s necessary to persist them. I personally always perform HTTP requests using the Session object from Requests. It makes my life easier.

Also, it can make your code less redundant. You only have to set the headers once for all the next requests you’ll perform.

To set your custom User-Agent in your Session object, you just have to update the headers like this:

Python

    import requests

# Define the headers
headers = {"User-Agent": "I am not a bot !"}

# Create a session
session = requests.Session()
# Update the headers
session.headers.update(headers)

# Perform the request
response = session.get("http://httpbin.org/headers")
print("First request:")
print(response.text)

print()

# We can still update the headers for a specific request
response = session.get(
    "http://httpbin.org/headers",
    headers={"Authorization": "Bearer 123456"},
)
print("Second request:")
print(response.text)

The first request shows that we don’t need to specify the headers keyword argument in the function call.

The second one shows we can still add or overwrite headers in the session.

Here is the output:

Bash

    First request:
{
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "I am not a bot !", 
    "X-Amzn-Trace-Id": "Root=1-66f6f7cd-6e550eaa49aac1c334e3d602"
  }
}


Second request:
{
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Authorization": "Bearer 123456", 
    "Host": "httpbin.org", 
    "User-Agent": "I am not a bot !", 
    "X-Amzn-Trace-Id": "Root=1-66f6f7cf-38e862721b85d8210ec2406a"
  }
}

Unset the User-Agent

In specific situations, you’d like to perform requests without any User-Agent.

So you might try something like this:

Python

    import requests

response = requests.get("http://httpbin.org/headers", headers={"User-Agent": None})
print(response.text)

But this gives us this output:

Bash

    {
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-urllib3/2.2.3", 
    "X-Amzn-Trace-Id": "Root=1-66f6fa2a-119fbb6c7feff6fc5ea2f350"
  }
}

It doesn’t work because the lib urllib3 is used behind the scenes, and it sets its default User Agent.

To remove the User-Agent field from your headers, you must set the header to the constant urllib3.util.SKIP_HEADER instead of None.

The first request shows that we don’t need to specify the headers keyword argument in the function call.

Python

    import requests
import urllib3

response = requests.get(
    "http://httpbin.org/headers", headers={"User-Agent": urllib3.util.SKIP_HEADER}
)
print(response.text)

And here is the output we were expecting:

Bash

    {
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "X-Amzn-Trace-Id": "Root=1-66f6fb26-3f2f2c36274e642c46f2d984"
  }
}

We can only use this constant to skip the headers User-Agent, Accept-Encoding, and Host.

To remove the other custom headers you have set in your session, put them to None or remove the key from the headers' dictionary.

Use random User-Agents

Making thousands of requests in a short time with the same user agent can make your traffic easy to detect.

So, rotating the User Agent can help us be more sneaky. To do that, we can simply pick a random User Agent from a list we have already defined, using the function choice(...) from the random module.

Here is another website where we can find recent user agents.

Let’s see that in the following example:

Python

    import requests
import random

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.3",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.3",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36 Edg/129.0.0",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115",
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.3",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128",
]


for _ in range(5):
    headers = {"User-Agent": random.choice(user_agents)}
    response = requests.get("http://httpbin.org/user-agent", headers=headers)
    print(response.json())

In this example, we first defined our list of user agents. Then, we performed 5 HTTP requests, picking a random user agent for each.

Here is the output:

Bash

    {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.3'}
{'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115'}
{'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.3'}
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36 Edg/129.0.0'}
{'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.3'}

Rotate over thousands of User-Agents with ua-generator

Rotating user agents from a predefined list of a few lines is not convenient enough for me.

Instead, I use the package ua-generator available on github, which makes my life easier, as does yours.

This is a package you must install. For example, using pip: pip install ua-generator.

Once installed, you can simply generate infinite user agents like this:

Python

    import ua_generator
import requests

for _ in range(5):
		# We generate a UserAgent object by calling generate()
    user_agent = ua_generator.generate()
    
    # The user agent can be accessed via the `text` attribute
    headers = {"User-Agent": user_agent.text}
    
    # Finally we perform the request
    response = requests.get("http://httpbin.org/user-agent", headers=headers)
    print(response.json())

In this example, we generate user-agents using the generate() function from the ua_generator package.

The output should look like this:

Bash

    {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_5 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko) CriOS/114.0.5735.115 Mobile/15E148 Safari/537.36'}
{'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.1518.29 Safari/537.36 Edg/109.0.1518.29'}
{'user-agent': 'Mozilla/5.0 (Linux; Android 14; SM-G3812; Build/UQ1A.220829.121) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.1370.37 Mobile Safari/537.36 EdgA/106.0.1370.37'}
{'user-agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:108.0.1) Gecko/20100101 Firefox/108.0.1'}
{'user-agent': 'Mozilla/5.0 (Linux; Android 7; Nexus 6; Build/NHG47V) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.135 Mobile Safari/537.36'}

Filter the User-Agent randomization.

By default, the generate() function returns a fully randomized user agent. This agent can be any device, platform, or browser.

When scraping a website, the HTML structure can change from one browser to another. So, you can generate a user agent but stick to a specific browser or platform.

You could generate fine-tuned user agents by passing arguments to the generate() function.

It can take 3 optional keyword arguments, which each, can be a string or a tuple.

Here are the possible values of each keyword argument:

Python

    device = ('desktop', 'mobile')
platform = ('windows', 'macos', 'ios', 'linux', 'android')
browser = ('chrome', 'edge', 'firefox', 'safari')

Now let’s see a short snippet of code to generate user agents, from a macos using Safari or Firefox:

Python

    import ua_generator

for _ in range(5):
    user_agent = ua_generator.generate(platform="macos", browser=("firefox", "safari"))
    print(user_agent)

We just set the platform argument to "macos", then the browser argument to a tuple including "firefox" and "safari" as we wanted 2 different browsers.

Here is the output:

Bash

    Mozilla/5.0 (Macintosh; Intel Mac OS X 12_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15
Mozilla/5.0 (Macintosh; Intel Mac OS X 14.6; rv:115.7) Gecko/20100101 Firefox/115.7
Mozilla/5.0 (Macintosh; Intel Mac OS X 11; rv:115.2) Gecko/20100101 Firefox/115.2
Mozilla/5.0 (Macintosh; Intel Mac OS X 13_2) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0 Safari/602.4.8
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/604.1.38 (KHTML, like Gecko) Version/11.0 Safari/604.1.38

The added value of ua-generator

The generate() function's return type is a UserAgent object. We saw that the user agent stands in the property text.

Yet, there is a more attractive property: the headers property.

The generated UserAgent comes with additional headers to mimic the traffic of a real browser better.

Then you can seamlessly perform sneakier requests, thanks to the ua-generator module:

Python

    import requests
import ua_generator


# Generate a random user agent
user_agent = ua_generator.generate(device="desktop", browser="chrome")

# Create a session
session = requests.Session()

# Update the session headers with the randomized user agent's headers
session.headers.update(user_agent.headers.get())

# Make a request
response = session.get("https://httpbin.org/headers")
print(response.text)

And the output shows us the additional “Client Hints” headers ("Sec-Ch-Ua…"):

Bash

    {
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "Sec-Ch-Ua": "\"Not A(Brand\";v=\"99\", \"Chromium\";v=\"112\", \"Google Chrome\";v=\"112\"", 
    "Sec-Ch-Ua-Mobile": "?0", 
    "Sec-Ch-Ua-Platform": "\"macOS\"", 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.5615.199 Safari/537.36", 
    "X-Amzn-Trace-Id": "Root=1-66fc2a54-541090ac688ba7e529e4b825"
  }
}

How to bypass sophisticated anti-bot protections: best practices

In this last section, we won't cover all the details about bypassing anti-bot protections.

However, we will cover some best practices to avoid protections that involve setting a cookie during the initial request.

Best practices to bypass anti-bot protections:

Send other browser headers: As we discussed, mimic real traffic. So, send common browser headers.
Use Proxies: Rotate IPs with residential or mobile proxies. This avoids blocks from rate limits or suspicious activity.
Tie the generated User Agent to a sticky proxy: Ensure you stick the randomized User Agent to the same proxy. This helps avoid triggering sophisticated anti-bot security.
Enable JavaScript Rendering: Some systems need JavaScript to run. To simulate a browser, use headless browsers like Selenium or Playwright.
Respect Rate Limits: Monitor and respect website rate limits. This avoids triggering anti-bot protections too soon.

For reliable proxies that support rotating IP addresses, check out AnyIP.io.

‍