Microsoft's latest breakthrough can understand speech as well as a human

Microsoft announced today that it cracked the code on voice recognition, claiming its new system can recognize spoken words as accurately as if you'd heard it yourself.

Engineers at Microsoft Artificial Intelligence and Research report that their automated voice recognition system has reached human parity - meaning it only makes as many mistakes recognizing words in a normal conversation as the average human being.

In tests, the system only reached a word error rate of 5.9%, a figure the research team claims is on par, if not better, than professional transcript writers - making for a historic first in the development of artificial intelligence.

Can you read me, Cortana?

History-making aside, Microsoft knows the everyday potential of this new tech, noting the breakthrough's "broad implications for consumer and business products."

One such example was the Xbox One which can already use voice commands to turn itself on, launch games, and more.

Shum also specifically mentions the company's Siri-like, Cortana, saying that human-like voice recognition would make "a truly intelligent assistant" possible.

We've got to go Deeper

Google's DeepMind project also made major strides in artificial intelligence this year with its AlphaGo robot, which beat some of humanity's greatest go champions - a game oft considered impossible for robots to grasp due to its near-infinite strategic possibilities.

Last month, DeepMind engineers also developed a new way to make robotic speech inflect like a human's, using a system of waveforms called WaveNet to make robots sound more human.

On the smaller end of the scale, Google is also working to cut down on the Uncanny Valley aspects of its Google Assistant, with researchers - and also some reported comedy writers - working to give its consumer AI a less monotone voice and even a sense of humor.

The robo-uprising is imminent, but not immediate

While historic, Microsoft's voice recognition system is still prone to the same things that trip up humans - things like accents, vocal impairments, and distracting background noise.

Microsoft also adds that voice recognition is not the same as understanding speech. Transcribing a conversation into words and distilling them into meaning are two different things, though Microsoft's next goal will be to improve its system so that it not only picks up speech, but can get it.

"Transcription is different from understanding; understanding is a different story," told Xuedong Huang, Microsoft's Chief Speech Scientist to us earlier this year. "To understand the message, the subtlety of what's being said – that's a long way off."

Until then, we're plenty excited knowing that soon we won't have to talk, to, computers, like, this, any, more.

Why DeepMind's latest triumphs should terrify you

TOPICS

Parker Wilhelm is a freelance writer for TechRadar. He likes to tinker in Photoshop and talk people's ears off about Persona 4.

Can you read me, Cortana?

We've got to go Deeper

The robo-uprising is imminent, but not immediate

Useful links