How to collect geo-targeted data

Outdoor photograph of a pair of hands holding a smartphone with navigator location points in the background — (Image credit: Getty Images)

It’s cool and frightening at the same time that the internet looks fairly different depending on where you’re browsing from. While localized and personalized content can make the good ol’ WWW more useful, it also creates isolated and fragmented digital worlds.

That same polarity transfers to businesses aiming to collect structured web data. Localization is both a hurdle and a big opportunity. Scraping from a server in one state or country means you’re only seeing one version of the truth. To get the full picture, you need to know the necessary nuances of geo-targeted data collection.

First, have a vast IP pool

The only way to see the web through the eyes of a local is to be where they are. This is why the first thing any scraping project worth its salt needs is a massive IP pool that covers every corner of the globe.

I am not exaggerating with that last part of the previous sentence. The name of the game here is granularity. A high-quality pool should ideally include all 195 countries in case you want to verify ads globally or circumvent regional geo-blocking. But if hyper-local data is not what you’re after, covering most major markets should be enough.

However, broad country-level targeting will only get you so far. Thus, you should cover thousands of specific cities and zip codes because search results in, say, Los Angeles aren't the same as search results in San Francisco.

The good news is that web scraping and proxy platforms like Decodo do the job for you. They have millions of residential and mobile IPs that enable you to pinpoint your target down to specific coordinates or ASNs by routing your requests through a local's home wireless or 5G connection.

As your scraper connects to a provider endpoint, it passes specific parameters in the authentication string, allowing for more granular targeting.

Differentiate between residential and mobile IPs

When setting up your geo-targeted scraper, you have to choose between residential and mobile IPs.

Residential IPs are assigned by an ISP to a home. They represent the de facto standard for geo-targeting since they have a high trust score. It’s because to a web server, a residential IP looks like an everyday person sitting at their workstation fiddling around on a laptop.

On the other hand, mobile IPs “look” even more human simply due to the fact that they are often shared by hundreds of users. Mobile carriers distribute one IP address among hundreds or thousands of different users. As a result, websites are hesitant to block mobile IPs since they may block genuine and potential customers alongside bad actors.

So, which type of IP do you use when?

Residential IPs are arguably better when there is lots of heavy lifting involved, like mapping out a competitor’s nationwide inventory or keeping track of global SEO rankings. They are wallet-friendly (especially with flexible pricing models like ‘pay as you go’) and have a high success rate, making them ideal for high-security or large-scale projects.

If you're scraping a mobile-first app (like Instagram or TikTok), using mobile IPs is often the only way to get accurate, localized data without being flagged. The same goes for websites with overly aggressive anti-bot shields, where a shared IP address works in your favor due to the fear of accidentally banning legitimate users.

Be wary of geo-mismatching

Of course, there are a few wrinkles you’ll need to iron out as you scrape. Some sites have figured out that people use proxies, which means they look at other things besides your IP. Namely, they ask the browser for its HTML5 geolocation.

It’s a browser API that basically tells inquiring websites and web applications where a user’s device is geographically (their latitude and longitude). HTML5 geolocation is rather invasive and precise because it leverages nearby Wi-Fi networks, Bluetooth signals, device sensors, and GPS hardware data.

So, if the user’s location based on their IP address doesn’t match the browser’s internal GPS from which the request originated, you get a geo-mismatch. This usually happens through the use of proxy servers and VPNs, as well as browser extensions and certain mobile apps.

As you can probably imagine, location mismatch is a classic tell-tale sign of a bot. Most advanced anti-scraping systems will immediately drop the connection or serve you a CAPTCHA that is often impossible to solve.

The solution is to employ headless browsers (like Playwright or Puppeteer) to spoof the coordinates. These have no graphical user interface, though they can load and interact with sites just like regular browsers. More importantly, they are managed entirely through code.

This does mean you’ll have to manually override the browser’s permissions and set the latitude and longitude in your script to match the city of your proxy. Yes, it’s an extra step, but it can be the difference that ultimately separates the amateurs from the pros. Tools like Decodo’s advanced APIs can automate this synchronization, so your digital DNA matches your physical location.

Geo-targeted scraping is a game of numbers

As the web continues to become more fragmented and personalized, the ability to collect data from a local perspective (perhaps ‘cache’ is the better term) will only become more valuable. But if you only have access to a few hundred IPs per country and/or city, you’ll quickly hit rate limits and end up on the blacklist.

To scale, you need a provider that has a vast (and diverse) IP pool covering almost every city on Earth and smart automation, if only to rid yourself of the infrastructure headache. That way, you can focus on writing the logic to extract the data, while they handle the tougher part of the job.

After all, the online world is a big place, and your primary responsibility is to make sure your scraper is seeing as much of it as possible - if not all of it.

Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.

First, have a vast IP pool

Differentiate between residential and mobile IPs

Be wary of geo-mismatching

Geo-targeted scraping is a game of numbers

Useful links