Microsoft announced today that it cracked the code on voice recognition, claiming its new system can recognize spoken words as accurately as if you'd heard it yourself.
Engineers at Microsoft Artificial Intelligence and Research report that their automated voice recognition system has reached human parity - meaning it only makes as many mistakes recognizing words in a normal conversation as the average human being.
In tests, the system only reached a word error rate of 5.9%, a figure the research team claims is on par, if not better, than professional transcript writers - making for a historic first in the development of artificial intelligence.
“Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, executive vice president of Microsoft Artificial Intelligence and Research.
Can you read me, Cortana?
History-making aside, Microsoft knows the everyday potential of this new tech, noting the breakthrough's "broad implications for consumer and business products."
One such example was the Xbox One which can already use voice commands to turn itself on, launch games, and more.
Shum also specifically mentions the company's Siri-like, Cortana, saying that human-like voice recognition would make "a truly intelligent assistant" possible.
We've got to go Deeper
Google's DeepMind project also made major strides in artificial intelligence this year with its AlphaGo robot, which beat some of humanity's greatest go champions - a game oft considered impossible for robots to grasp due to its near-infinite strategic possibilities.
Last month, DeepMind engineers also developed a new way to make robotic speech inflect like a human's, using a system of waveforms called WaveNet to make robots sound more human.
On the smaller end of the scale, Google is also working to cut down on the Uncanny Valley aspects of its Google Assistant, with researchers - - working to give its consumer AI a less monotone voice and even a sense of humor.
The robo-uprising is imminent, but not immediate
While historic, Microsoft's voice recognition system is still prone to the same things that trip up humans - things like accents, vocal impairments, and distracting background noise.
Microsoft also adds that voice recognition is not the same as understanding speech. Transcribing a conversation into words and distilling them into meaning are two different things, though Microsoft's next goal will be to improve its system so that it not only picks up speech, but can get it.
"Transcription is different from understanding; understanding is a different story," told Xuedong Huang, Microsoft's Chief Speech Scientist to us earlier this year. "To understand the message, the subtlety of what's being said – that's a long way off."
Until then, we're plenty excited knowing that soon we won't have to talk, to, computers, like, this, any, more.