How Microsoft beat Google at understanding images with machine learning

152 layers isn't the limit, says Lee. "We now believe we can get to not just hundreds of layers but thousands of layers, and I'm confident with such extremely deep networks that we can get very high accuracy not just for vision but for speech, for deep text understanding problems – really for anything!"

He's hoping these sorts of very deep networks could help with "a much deeper nuanced understanding of human discourse and of text" – perhaps reading papers or listening to conversations "to understand the meanings and context". Microsoft is already analysing internal emails to try and understand what they're about. "They're proving to be very interesting and valuable," he says.

Peter Lee, head of Microsoft Research, believes ultra-deep networks could help machines understand not just images but text and speech

Peter Lee, head of Microsoft Research, believes ultra-deep networks could help machines understand not just images but text and speech

Sun agrees. "These kinds of designs are not limited to computer vision. The general ideas are applicable to other AI problems such as speech recognition, natural understanding of text, understanding medical images, data mining…"

But for Sun, the important thing isn't just how well the deep residual learning network performs, but the possibilities the technique opens up for other ways of improving deep networks. "This opens the door to explore network designs. Going deeper is just one way [to get better results].

"Now we're looking for more diverse network architecture designs. Another direction we're working on is parallel training, so we train the whole system across the machine, where each machine might have four or eight GPUs, so we can train even deeper networks in parallel."

Prize possessions

ImageNet isn't the only prize Microsoft has been winning with unconventional machine learning techniques – Antonio Criminisi of Microsoft's UK research lab in Cambridge was just awarded the prestigious Marr prize for using not the currently popular deep neural networks but an older (and les resource-intensive) approach called decision forests to get equally good image recognition. Not sticking with the fashionable approaches to machine learning seems to be paying off for Microsoft.

Just as exciting is the fact that anyone will be able to use these ultra-deep networks soon. "One reason why this is so exciting for us is that it's not just a scientific issue but it has serious commercial value," says Lee. "The thing that's cool is that you present the paper at the conference one week, and the next month this is shipping."

Ultra-deep networks will soon be behind new APIs that developers can use for free through the Project Oxford system or pay for as part of Cortana Analytics, and they'll be used inside Microsoft projects too, he says. "We expect that very soon we will have hardened these technologies and made them restful APIs for developers."

And the more machine learning systems are used, the better they get, Lee points out. "There's a virtuous cycle there. The lift you get from even unsupervised training can be significant; these things just improve the more they're used and the more varied the data is. This is one of the rare moments where pure science and commercial deployments like Project Oxford are happening step-by-step together. That seems to really make it a very special time for the field."


Mary (Twitter, Google+, website) started her career at Future Publishing, saw the AOL meltdown first hand the first time around when she ran the AOL UK computing channel, and she's been a freelance tech writer for over a decade. She's used every version of Windows and Office released, and every smartphone too, but she's still looking for the perfect tablet. Yes, she really does have USB earrings.