Microsoft's HowOld site for guessing your age from a photo was a viral success. The new MyMoustache site that tells you how the moustache you're growing for the annual 'Movember' charity is coming along (and offers to give you a fake moustache if you just want to join in the fun) might not take off the same way.
But it does show off some of the new tools Microsoft has added to the Project Oxford APIs that let developers use machine learning to find faces, understand what users say and type – and now, how they might be feeling.
"The emotion API detects emotions in human faces," Ryan Galgon of Microsoft's Technology and Research group told techradar. It suggests up to eight emotions that he calls 'universal' for faces detected in an image – anger, contempt, fear, disgust, happiness, neutral, sadness or surprise (or a mix of those) – and it can work with multiple faces in a picture. "We can already tell what's happening in photos and who is in photos, and now we can move beyond that, with sentiment analysis."
Imagine a photo app that automatically composites faces from multiple images so you get a family photo where everyone is smiling. "Or you could pick the best photo in an album based on whether people are smiling or not," Galgon suggests.
Detecting beards and moustaches is another of the new face recognition options that developers will be able to use. "We also have significant improvements for detecting age and gender," Galgon told us. Some of the new options are available straight away, and others will be available over the coming weeks.
The existing face detection options will now work for video as well as still images, and the APIs can follow a particular person's face through a video. Initially that's about finding a face in the video, including knowing that faces don't usually disappear – so even if it's not detected in one frame it's likely to be there.
In time, though, you're likely to be able to do the same kind of things for faces detected in a video that you can for faces detected in photos, Galgon says – so you could detect the emotions displayed during the video and look for when they change. "The APIs we have are starting to be able to work together, like the face detection and emotion detection. The direction we're going for is to have them provide a common set of capabilities, regardless of the type of input."
Not all of the frames in video will be interesting, or fully in focus, of course. Two further new video tools in Project Oxford do image stabilisation to clean up the video (using similar research to Microsoft's Hyperlapse high-speed video) and motion detection. "The problem with motion detection is the false positives," Galgon points out. "You don't want to detect motion every time a cloud moves across the sky or a car drives past; you want to detect where there is motion in the foreground."
Learning new words
A new spell checking service is designed to clean up text users are typing into apps, especially on mobile devices, where it's easy to miss off a letter or put a space in the middle of words, both of which the API can fix, as well as looking at the context to catch mistakes like 'four' instead of 'for'. "There might be misspellings that can throw off the system," Galgon pointed out. "If they're looking for Chicago, typing hicago isn't going to find it."
Instead of the traditional spell check that just looks up words in a dictionary, the idea is to have the spelling API be able to deal with slang and 'informal' language. "The challenge is adapting over time when new phrases get coined or when a new startup becomes popular. So all of a sudden 'lift' is spelled 'Lyft' and it's a valid word that wasn't a word a year ago. The nice thing about making this a web service is that when we have new words and models, we update those in the back end and developers get better results for free."
The spell check API won't learn how different people misspell words (although that's a possible area of research), but you can give it specific terms for your application. Galgon suggests: "Imagine being able to build a better speller for a particular domain, you can tell the API, here's a set of our product names that might not get recognised correctly."