Google has a shiny new tool to keep your Gmail inbox spam-free.
RETVec is short for Resilient and Efficient Text Vectorizer, with vectorization being a “methodology in natural language processing to map words or phrases from a corresponding vector of real numbers” and then using those to run further analysis, predictions, and word similarities, per Towards Data Science.
With RETVec, Gmail will be better at spotting spam emails hiding invisible characters, LEET substitution (3xpl4in3d instead of explained, for example), intentional typos, and more. Harmful email messages will have a tough time making it into inboxes.
Reader Offer: $50 Amazon gift card with demo
Perimeter 81's Malware Protection intercepts threats at the delivery stage to prevent known malware, polymorphic attacks, zero-day exploits, and more. Let your people use the web freely without risking data and network security.
Preferred partner (What does this mean?)
More than 100 languages supported
"RETVec is trained to be resilient against character-level manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more," Google explains on GitHub. "The RETVec model is trained on top of a novel character encoder which can encode all UTF-8 characters and words efficiently."
Right out of the box, RETVec will support more than 100 languages, Google said, adding that it could thus be deployed in different scenarios:
"Due to its novel architecture, RETVec works out-of-the-box on every language and all UTF-8 characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments," Google's Elie Bursztein and Marina Zhang noted.
With RETVec, Google’s spam detection rate increased by 38%, the company said, adding that its false positive rate dropped by almost a fifth (19.4%).
The Tensor Processing Unit (TPU) usage of the model dropped by 83%.
"Models trained with RETVec exhibit faster inference speed due to its compact representation. Having smaller models reduces computational costs and decreases latency, which is critical for large-scale applications and on-device models," Bursztein and Zhang added.
Spam is the most popular attack vector in existence, used by virtually all cybercriminals out there. It’s omnipresent, cheap, and efficient, and enables threat actors to deliver malware and steal sensitive data.
More from TechRadar Pro
- Gmail might finally stop filling your inbox with spam now
- Here's a list of the best firewalls around today
- These are the best endpoint protection tools right now
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
Sead is a seasoned freelance journalist based in Sarajevo, Bosnia and Herzegovina. He writes about IT (cloud, IoT, 5G, VPN) and cybersecurity (ransomware, data breaches, laws and regulations). In his career, spanning more than a decade, he’s written for numerous media outlets, including Al Jazeera Balkans. He’s also held several modules on content writing for Represent Communications.