For those unfamiliar with the practice, web scraping is an automated technique used to gather data from websites that is often employed by analytics firms who use it to create large databases of user information. Although the practice is legal, it is strictly prohibited by social media companies as it puts the privacy of their users and their data at risk.
Comparitech's lead researcher Bob Diachenko discovered three identical copies of the exposed database online at the beginning of August. After examining the database, Diachenko and his team learned that it belonged to a company called Deep Social which has shut down its operations.
- Protect your privacy online with one of the best VPN services
- We've put together a list of the best anonymous browsers available
- These are the best privacy apps for Android on the market
When the team reached out to the now-defunct company, its request was forwarded to a Hong Kong-based firm called Social Data. While Social Data denied having any connection to Deep Social, the firm did acknowledge the breach and was able to secure the exposed database with a password.
Social media scraping
In an email to Diachenko included in Comparitech's blog post on the matter, Social Data tried to defend the practice of web scraping while also making the point that the database, which was left online without a password to secure it, was not hacked, saying:
“Please, note that the negative connotation that the data has been hacked implies that the information was obtained surreptitiously. This is simply not true, all of the data is available freely to ANYONE with Internet access. I would appreciate it if you could ensure that this is made clear. Anyone could phish or contact any person that indicates telephone and email on his social network profile description in the same way even without the existence of the database. Social networks themselves expose the data to outsiders – that is their business – open public networks and profiles. Those users who do not wish to provide information, make their accounts private.”
Diachenko and his team discovered three identical copies of the database which were hosted at three separate IPV6 addresses. Of the nearly 235m social media profiles in the database, 191m records were scraped from Instagram, 42m were scraped from TikTok and almost 4m were scraped from YouTube.
Each of the entries in the database contains a wealth of information on the users of these services whose data was scraped including their profile name, real name, profile photo, age, gender, engagement statistics and more.
While scraping user data from social media sites is not illegal, failing to secure this data after it has been collected poses a serious risk to the affected users as cybercriminals could use the information from the database to target them online.
- We've also highlighted the best antivirus software
Via The Next Web
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
After working with the TechRadar Pro team for the last several years, Anthony is now the security and networking editor at Tom’s Guide where he covers everything from data breaches and ransomware gangs to the best way to cover your whole home or business with Wi-Fi. When not writing, you can find him tinkering with PCs and game consoles, managing cables and upgrading his smart home.