The inadequacy of voice biometrics

Cybersecurity in action.
(Image credit: iStock)

Voice biometrics technology is designed to enable identity verification by analyzing a person’s voice. This form of biometrics has become particularly commonplace in the financial services sector where it was adopted to make customer authentication onto a platform or service quicker and safer – based on the principle that people have their voice with them all the time, and each voice is unique to an individual.

However, following a spate of incidents in recent months where threat actors have used generative AI voice clips known as ‘vishing’ or ‘voice deepfakes’ to open and gain access to financial accounts, organizations have serious concerns regarding its dwindling integrity. This is why it’s quickly acquiring a reputation for being one of the easiest forms of biometrics to clone.

There are two key reasons for this: the tools being used to create realistic dupe voices are readily available and a high-quality synthetic voice can easily fool the human ear. According to MIT and Google, just one minute of voice data is all that’s necessary to create convincing, human-quality audio. Governments, employees, and regulators have all cast doubt on the viability of voice biometrics. Such is the fear that in May of this year, the US Senate asked some of the world’s largest financial institutions what actions they’re taking to prevent deepfake voice fraud.

Of course, the concept of synthesized voice has the potential to impact many different circles of life but in this instance, we are focusing on organizational security in terms of authentication and verifying user identities remotely. Voice biometrics is most frequently used for authenticating returning customers versus onboarding new customers. For example, speech pattern analysis takes place passively as they speak, while other uses include apps when the user is asked to tap a button or say a passphrase as a form of step-up authentication to further services. The rapid growth in generative AI tools has led to a sharp spike in the development and availability of voice cloning technology, which can create a voice that sounds identical to an authentic voice. When it comes to voice biometrics, cloning isn’t its only weakness. There are also major concerns regarding identity assurance, performance, and the accessibility that it provides to users. This certainly brings into question its suitability for secure, large-scale transactions.

Dominic Forrest

Chief Technology Officer, iProov.

When it comes to voice biometrics, cloning isn’t its only weakness. There are also major concerns regarding identity assurance, performance, and the accessibility that it provides to users. This certainly brings into question its suitability for secure, large-scale transactions.

While biometrics provide a far more reliable method of identity management than traditional technologies such as one-time passcodes, the biometric threat landscape is evolving fast. If you’re unable to determine that an individual is live and authenticating in real-time then that leaves the technology exposed to spoofs or ‘biometric attacks’. There are different types of liveness and technologies available today and considerable education is needed on what works best - which is why there’s such a need for a mission-critical solution.

A viable and far more secure alternative to voice biometrics is face biometric technology which can be used for both onboarding and ongoing user verification. It can also be matched against a government-issued trusted ID document which isn’t possible with a voice. Another failure of voice biometrics is its inability to secure the highest risk point in the user journey: onboarding. So, there is no defense against the most pervasive and damaging identity fraud types, such as synthetic identity fraud which limits its usage through not being able to provide both users and organizations with the necessary identity assurance.

Liveness detection is a key component incorporated into biometric face solutions that detects if a user asserting their identity is a real ‘live’ person and not a presented artifact (such as a photo or mask) or generative AI (deepfakes).

The use of deepfake videos and images created using generative AI software showing people saying or doing things they didn’t is also rapidly increasing. They have become almost impossible to spot without the use of detailed, expert analysis and advanced tools. People must understand that it’s incredibly hard to identify generative AI attacks such as deepfakes or, more recently, face swaps – particularly to the naked eye. The fraudulent output can look entirely realistic and very different from the actual input. We can’t just rely on people to spot AI attacks. Technology companies must take the appropriate measures to monitor for potential criminals exploiting this technology.

The question of “How can we be sure of someone’s identity online?” is an extremely important and serious topic and presents a huge challenge to organizations in today’s digital economy. It’s a challenge that’s not going away any time soon. Weak authentication and verification mean weaker borders at the point of travel, compromised online accounts, weaker information security, and more.

Voice biometrics just doesn’t provide the required identity assurance protocols to stand up to today’s ever-changing threat landscape. Some like to label facial biometrics as science fiction, but it most certainly isn’t and provides by far the safest and most secure way to protect identities online. Organizations must realize that it’s necessary and needed now more than ever.

We've listed the best identity theft protection for families.

Dominic Forrest, Chief Technology Officer, iProov.