Guide
If you've deployed an AI agent that browses the web, you've probably watched it work in development and break in production. The agent hits a Cloudflare challenge or gets a flat 403 instead of the data it needs. Your pipeline stops cold.
Most people blame the agent framework or the browser automation library. Almost always, the root cause is simpler: the IP address. Anti-bot systems decide whether to block a request based on where it comes from before they ever look at browser behavior or JavaScript execution.
This guide walks through what's actually happening at the network level when your agent gets blocked, which proxy type works against modern anti-bot systems, and how to wire it all up in OpenClaw.
Why AI agents trigger anti-bot systems at higher rates
AI agents don't browse like humans. The specific ways they differ are what trigger detection.
A human opens a page, scrolls around, maybe reads for 30 seconds before clicking a link. An AI agent navigates directly to a target URL, extracts content, and moves on. Time between page loads is 1-2 seconds instead of 15-30. No mouse movement, no natural scroll behavior. The browsing pattern looks mechanical because it is.
But the behavioral stuff is secondary. Anti-bot systems don't start with mouse tracking. They start with the IP address. The evaluation order for most detection systems (Cloudflare Bot Management, DataDome, Akamai Bot Manager) looks roughly like this:
1. IP reputation and ASN classification
2. TLS fingerprint (JA3/JA4 hash)
3. HTTP header order and consistency
4. JavaScript execution environment
5. Behavioral analysis (mouse movement, scroll patterns, timing)
If the IP fails at step 1, the request never reaches behavioral analysis. This is why a perfectly configured Playwright browser still gets blocked when it runs from an AWS instance. The IP address tells the anti-bot system what it needs to know before it checks anything else.
Traditional scrapers send isolated HTTP requests and rotate IPs per request. AI agents are different. They maintain sessions across multiple page loads, fill forms, click buttons, and navigate multi-step flows. That longer interaction window gives anti-bot systems more signal to work with, which makes your IP choice matter even more.
The proxy hierarchy: datacenter, residential, and mobile
The proxy type you choose affects your success rate more than any other single variable. The differences are bigger than most people expect.
Datacenter proxies
Datacenter proxies route traffic through servers at cloud providers: AWS, Google Cloud, DigitalOcean, Hetzner. These IPs belong to well-known Autonomous System Numbers (ASNs) that anti-bot systems maintain blocklists for. AS16509 is Amazon. AS15169 is Google. AS24940 is Hetzner.
When a request arrives from one of these ASNs, it gets flagged before behavioral analysis starts. The network identity alone classifies the traffic as automated. In testing across multiple protected targets, datacenter proxies succeed on less than 10% of Cloudflare-protected sites and under 5% on DataDome targets. They're fine for hitting unprotected APIs, but any site with anti-bot protection rejects them on sight.
Residential proxies
Residential proxies route traffic through real ISP connections. Home broadband users opt into proxy networks (usually in exchange for a free VPN or app credits), and their IPs get shared with proxy customers. The IPs belong to ISP ASNs like Comcast (AS7922) or AT&T (AS7018), which anti-bot systems treat as lower risk than datacenter ranges.
Against Cloudflare's default settings, residential proxies see 50-70% success rates. The problem is pool contamination. Thousands of proxy customers share the same IP pool, and aggressive scraping by some users poisons individual IPs for everyone. Against tighter detection like DataDome or PerimeterX, shared residential pools drop to 30-50% success.
Mobile carrier proxies and CGNAT
Mobile carrier proxies route traffic through real 4G/5G SIM cards on physical devices connected to carriers like T-Mobile, Verizon, or AT&T. They outperform everything else, and the reason is Carrier-Grade NAT (CGNAT).
CGNAT is how mobile carriers conserve IPv4 addresses. Instead of assigning one public IP per phone, they route hundreds or thousands of devices through a single public IP. When your AI agent uses a mobile proxy, its traffic shares an IP address with thousands of real phone users who are scrolling social media and checking email.
This creates a problem that anti-bot systems can't solve cleanly. Blocking a mobile carrier IP means blocking every device behind that CGNAT pool, potentially thousands of paying customers. Anti-bot vendors learned this the hard way and backed off. They now give mobile carrier ASNs (T-Mobile's AS21928, Verizon's AS6167) far more latitude than datacenter or residential ranges.
The result: mobile carrier proxies consistently achieve 85-95% success rates against Cloudflare, Akamai, and DataDome. They're not immune to detection. Behavioral analysis can still flag obviously non-human browsing patterns. But the IP layer stops being the reason your agent gets blocked.
Proxy type | Cloudflare success | DataDome success | Typical cost/GB | Best for |
Datacenter | 5-10% | <5% | $0.50-2 | Unprotected APIs only |
Residential (shared) | 50-70% | 30-50% | $5-15 | Moderate protection |
Mobile (shared pool) | 75-85% | 60-75% | $15-30 | General web scraping |
Mobile (dedicated) | 90-98% | 85-95% | $20-50+ | High-value protected targets |
A note on cost: running 1,000 requests through datacenter proxies at $1/GB with 5% success gives you 50 usable responses. Running them through mobile proxies at $25/GB with 90% success gives you 900 usable responses. The per-successful-request cost is actually lower with mobile proxies despite the higher price per gigabyte.
Configuring proxy support in OpenClaw
OpenClaw has over 254,000 GitHub stars and is the most widely used AI agent platform. Its browser tool gives agents real browser automation: navigation, clicking, form filling, screenshot capture, all through a managed Chromium instance.
The gap has been proxy support. Until recently, there was no built-in way to route the browser tool's traffic through a proxy server. Native proxy support is coming to OpenClaw's browser tool, adding HTTP, HTTPS, and SOCKS5 proxy configuration directly in the browser settings.
What native proxy config will look like
Once native support lands, you'll configure it in your OpenClaw settings file:
Per-profile proxy overrides will also be supported, so you can route different browser profiles through different proxy providers:
Current workaround: Playwright proxy settings
While you wait for native support, you can configure proxy settings through Playwright directly. OpenClaw's managed browser connects to Chromium via CDP, and Playwright accepts proxy configuration at launch:
This approach works with any agent framework that uses Playwright under the hood, including Browser Use and similar tools.
HTTP proxy for non-browser agent tools
For agents that use HTTP fetching instead of a full browser (like OpenClaw's web_fetch tool or LangChain's web loader), configure the proxy at the HTTP client level:
Note that environment variables like HTTP_PROXY and HTTPS_PROXY work for HTTP client libraries but do not affect browser instances launched by OpenClaw. For the browser tool, you need either the upcoming native proxy config or the Playwright approach above.
Session management and request pacing
The right proxy type gets your agent through the door. Session management keeps it from getting thrown out.
Anti-bot systems track sessions: the combination of IP address, browser fingerprint, cookies, and behavioral patterns across multiple requests. How you manage proxy sessions matters as much as which proxy you use.
Sticky sessions vs rotating IPs
For AI agent workflows, sticky sessions are usually the right default. A sticky session keeps the same IP address for a set period (5-30 minutes, depending on the task), so the agent's browsing looks like a single user navigating a site.
IP rotation per request is better for bulk data collection where each request stands alone. But AI agents that log in, navigate multi-step flows, or maintain state across pages shouldn't rotate IPs mid-session. A real user doesn't change IP addresses between clicking "Next Page" and loading results. Anti-bot systems know this and flag session-level IP changes as a strong indicator of automation.
Request pacing
AI agents can fire requests fast. A human takes 15-30 seconds between page loads. An agent can do it in under a second.
Add deliberate delays: 2-5 seconds for normal browsing, 5-10 seconds for high-security targets. One thing that works in your favor here is the agent's natural processing time. The 1-3 seconds the LLM spends reading the page and deciding what to do next is free pacing you don't have to fake.
Still, for high-volume tasks, add explicit jitter on top of the LLM latency. Random delays between 2-7 seconds are harder for anti-bot systems to fingerprint than a fixed 3-second wait.
Fingerprint consistency
Your proxy and browser fingerprint need to tell the same story. If your IP geolocates to Dallas on a T-Mobile ASN, the browser's timezone should be Central Time, the language should be en-US, and the screen resolution should match a device you'd expect on that network.
Mismatches between IP metadata and browser fingerprint are a strong detection signal. Most mobile proxy providers offer geo-targeting by city or region. Match your browser configuration to the proxy's location. Illusory's 5G mobile proxies use real carrier infrastructure, which means the IP metadata (ASN, geolocation, carrier name) is authentic rather than spoofed, reducing fingerprint mismatches.
Production checklist for web-facing AI agents
Before you deploy an AI agent that touches the public web, run through this list:
1. Proxy infrastructure is configured and tested. Mobile carrier proxies for protected sites, residential for moderate targets. Test against your actual target sites, not just example.com.
2. Sticky sessions are enabled. 5-10 minute session duration for multi-step workflows. Longer for login-protected flows.
3. Request pacing is implemented. Minimum 2-5 seconds between page loads. Add random jitter (not fixed intervals).
4. Browser fingerprint matches proxy metadata. Timezone, language, and screen resolution should be consistent with the proxy IP's geolocation and network type.
5. TLS fingerprint is realistic. Use a stealth browser (Camoufox, Nodriver) or a patched Playwright build. Stock Playwright and Puppeteer produce detectable JA3/JA4 hashes.
6. Error handling covers anti-bot responses. Detect 403s, CAPTCHA redirects, and empty responses. Retry with exponential backoff and IP rotation on failure.
7. Success rate monitoring is in place. Track your success rate per target site. Alert when it drops below 80%. A sudden drop usually means the target site updated its anti-bot rules or your proxy pool's reputation degraded.
8. Fallback proxy pool is configured. If your primary proxy fails, the agent should be able to switch to a backup provider or pool without manual intervention.
Frequently Asked Questions
Do I need mobile proxies if my AI agent only accesses public APIs?
If the API has no rate limiting or IP-based blocking, no. But most production APIs enforce rate limits, and many sit behind Cloudflare or similar protection. Test without a proxy first. If you hit 403 responses or rate limit errors under normal load, you need proxy infrastructure.
What's the difference between shared and dedicated mobile proxies?
Shared mobile proxy pools route multiple customers through the same set of SIM cards and IPs. Dedicated mobile proxies assign specific hardware (physical SIM cards on bare-metal 5G devices) to a single customer. Shared pools are cheaper but carry the same pool contamination risk as residential proxies. Dedicated infrastructure gives you clean IPs that only your traffic touches, which is why success rates are 10-15% higher on aggressive anti-bot targets.
Can I use free proxies with my AI agent?
Free proxies are almost always datacenter IPs with burned reputation. They fail on any protected site and frequently inject tracking or ads into the traffic passing through them. For anything beyond testing, paid proxies with maintained IP pools are the minimum.
How do I check if a website uses anti-bot protection?
Check response headers. Cloudflare adds a cf-ray header. DataDome sets a datadome cookie. Akamai uses akamai-grn headers. PerimeterX sets _px cookies. You can also check if the initial page load serves a JavaScript challenge (a brief loading screen before the real content appears) rather than the actual HTML content.
Getting started
None of the agent logic matters if the agent can't reach the target site. The proxy layer is what makes web access work for autonomous agents in production.
If you're building agents that need reliable access to protected websites, Illusory provides dedicated 5G mobile proxy infrastructure built on bare-metal hardware with real carrier SIM cards. Each customer gets single-tenant devices, real mobile carrier IPs that anti-bot systems treat as legitimate mobile traffic, and instant IP rotation via API with unlimited bandwidth.
Check pricing and plans or read the integration docs to connect your AI agent framework.
Latest Blogs
