Machine learning is improving everything from image and speech recognition to predicting when machinery will fail. It's what makes Cortana smart enough to crack jokes and predict sports matches, as well as tell you when to leave early for your meeting because the traffic is bad. But there's still a long way for digital assistants to go before we can really trust them.
We're still testing computer systems that try to help us, not on facts but on how well they communicate, claims Bing's search director Stefan Weiz in his recent book, 'Search'.
"In systems like Siri or Cortana, 30 to 40% of all interactions people initiate are social or silly questions that probe the reality of the 'assistant' rather than inquiries the system was intended to answer. We want (and need) to believe we're engaging with someone who understands not only math but our humanity."
Chit chat with Cortana
That's why Microsoft built the 'chit chat' system into Cortana that lets her sing songs and do imitations. "Humour has been a major focus for MSR in our partnership with the Cortana team," head of Microsoft Research Peter Lee told us. "Chit chat is a lightweight machine learning system and we can keep increasing the number of domains Cortana is able to chit chat with you about."
The problem is how quickly Cortana can keep up with what's going on when it comes to breaking news rather than the seasonal humour the Bing team adds in, like tracking Santa or joking about the Seattle Seahawks.
"In Cortana or any digital assistant there's the freshness of what Cortana knows about," Lee explained to us. "Cortana is continually learning. You can have a chit chat conversation ask about who is going to win the Seahawks game coming up next Sunday. But on the evening where things are erupting in Ferguson Missouri, it is not always obvious that Cortana will have the freshness of knowledge to interact."
That's why, although Microsoft is having a lot of success with the currently popular deep learning systems for services like Skype Translator and the new image recognition in OneDrive (where you photos now automatically get tags like 'flower' and 'beach'), it's not putting all its eggs in that basket. "We're getting much smarter in coming to realisations about when to use deep neural networks or probabilistic models versus other learning techniques," Lee explains, and the next step is more dynamic machine learning systems that stay fresh.
"With traditional models of machine learning you spend this enormous effort to get a huge corpus of data and train the system offline and deploy it. But we find in more and more situations that model isn't good enough. As machine learning becomes a more and more integral part of everything that we touch and interact with, I think the issues of maintenance of that intelligence and the freshness of that intelligence will become more and more important," Lee warns. The problem is something he calls 'ML rot'.
"Right now, typically machine learning systems are static and their effectiveness sometimes degrades over time. Although work in machine learning is always advancing, a specific machine learning system isn't. At some point you have to gather a bunch of experts and go through huge efforts to train it and start anew. That's not scalable. You need a process where non-experts are able to maintain and advance machine learning systems, and where machine learning systems are more amenable to continuous learning."
Cortana isn't the only Microsoft system that's doing more of this continuous dynamic learning. "In Halo the TrueSkill ranking system uses a probabilistic model that is much more dynamic," he told us. Your ranking is how Halo suggests other players to play against. TrueSkill tracks both how good the system thinks each player is, and how certain it is that the skill rating is right. The more you play, the more certain the system gets about how accurately it's ranking you, and it has to do that in just a few games even though there are millions of people playing.
The same tools can be useful at work too. TrueSkill is built with Infer.NET, the same system that Clutter uses to work out which email messages you'll want to see, using descriptions that are easy to write – and to update when you want to improve the system.
Clutter learns from your behaviour, and at the moment it takes up to a day for it to change how it treats messages you've dragged back into your inbox when it mistakenly thought they weren't interesting. Delve, an Office 365 service that tries to prioritise the documents and attachments that people have shared with you, can take a couple of days to spot new documents you need to know about.
If we're going to rely on these systems, they have to get faster. After all, we expect search engines to know about news as soon as it happens. "Businesses will increasingly get to the point where they want intelligent agents to be aware of the very latest thoughts, utterances, emails and documents by all of their colleagues, all of the time," predicts Peter Lee. "Even a machine learning system that retrains once a day will be too slow and unintelligent."