Machine language: Computers are on the brink of mastering speech recognition

Like the other open source deep learning toolkits from Facebook, Google and various universities, CNTK uses GPUs for speed. Not only is it as fast or faster than the other toolkits when you run it on one PC with one GPU, it's nearly twice as fast when you run it on a PC with two GPUs. It's also the only toolkit that can run on multiple machines at once, and with eight GPUs on two PCs it's about three times as fast as the competition.

A speed comparison of CNTK versus other toolkits

A speed comparison of CNTK versus other toolkits

CNTK is faster than other deep learning toolkits and it scales better because you can run it distributed across multiple machines (it should run well on the new Azure GPU service that's currently in private preview). That performance is important for dealing with the massive amounts of data you need for problems like speech recognition.

"If you want to really develop artificial intelligence, you have to process data at web scale," he says. "Google brags that they deal with a huge amount of data in a distributed way, but what they've open sourced is really a small toolset."

"Since we adopted CNTK for experimenting with Cortana's speech recognition, the productivity for the product team has increased by almost a factor of ten. It's given them a huge boost. Before, it took them weeks to finish one experiment. They said before they adopted it they felt like they were driving a Volkswagen, after they switched it's like driving a Ferrari."

Nothing new

Speech recognition has been in Windows since Windows 95, Huang points out. "Thanks to Bill Gates' vision, as early as the 90s, we invested early in speech recognition. The progress year by year in driving speech recognition errors down has been foundational – if the error rate is too high [to be useful], then having vision doesn't help!

"But 20 years ago, Microsoft introduced the first speech API in Windows 95 and 20 years after that Microsoft added a range of AI tools going beyond speech into vision and understanding in Azure ML. With CNTK, it's the same desire to enable developers to take advantage of technology."

But the speech recognition it was designed to speed up isn't the only thing CNTK is good at. Microsoft has been trying it out for image recognition as well and, Huang claims, "CNTK is on a par with the best toolset out there for image processing."

Before, the Microsoft researchers and developers working on image recognition were using the popular Caffe tool from the University of Berkeley. Now they're switching over to CNTK, and as the latest GPUs arrive its performance is just getting better.

All-rounder

Being good at more than one task isn't usual for AI toolkits; they're usually very specific. "Caffe is just beautiful for image processing," says Huang, "but it's almost impossible to adopt that for speech." Huang is cautious about claiming that CNTK can handle all deep learning tasks – speech recognition, image recognition and natural language understanding are the three areas he's focusing on, but he's excited to see what people will do with it in other areas.

He concludes: "This tool is so powerful; it can absolutely deal with bigger challenges. The beauty of the tool is that when we get this into the hands of developers, something totally unexpected could happen that's just beyond our imagination. I believe they will find very creative ways of using it.

"The Microsoft internal workloads that we're building with CNTK are unbelievable. If you ask me what the next breakthrough will be, I'd say artificial intelligence – we'll create truly intelligent services that will help people to do more and reach a new level we've never experienced in the past."

Contributor

Mary (Twitter, Google+, website) started her career at Future Publishing, saw the AOL meltdown first hand the first time around when she ran the AOL UK computing channel, and she's been a freelance tech writer for over a decade. She's used every version of Windows and Office released, and every smartphone too, but she's still looking for the perfect tablet. Yes, she really does have USB earrings.