Microsoft is making big efforts in the field of voice technology, and is boasting about having honed its speech recognition AI to an even greater extent, with a system that apparently now rivals the accuracy of multiple human transcribers.
Nearly a year ago Microsoft revealed that it had bested IBM by hitting a word error rate (WER) of 6.3% (in the industry-standard Switchboard speech recognition benchmark), dropping that to 5.9% just a month later – and at the weekend the firm announced that it's now achieved a WER of 5.1%.
While the WER of 5.9% matched the ability of a professional human transcriber, the new 5.1% score is on a par with a multi-transcriber process – i.e. multiple humans working together, and listening to the material several times.
In a blog post, Microsoft explains that this is a "new industry milestone, substantially surpassing the accuracy we achieved last year".
Neural net gains
The company noted that it managed to reduce its error rate by 12% compared to last year, mainly through improvements to its neural net-based acoustic and language models.
What does all this mean for the average user? In short, more accurate speech recognition for Cortana across desktop PCs and mobiles, although Microsoft also uses voice recognition elsewhere – in Office and Microsoft Cognitive Services, while even Windows has its own built-in speech recognition (which is better than you might think these days).
The ultimate aim for Microsoft is to be able to improve Cortana to a level where it’s possible to hold a natural conversation in which it feels like you’re talking to a human being as opposed to a digital assistant.
Along with voice recognition, Microsoft Translator is also much improved of late, having made our list of the best translation software of 2017.
- Some of our best laptops of 2017 use Windows 10 and Cortana