Why Siri is just the start for natural input

The soul behind the screen

Giving a computer a name and a voice immediately humanises it. Be honest, have you ever caught yourself thanking Siri, just to be polite? At the very least, do you think of it as a 'him' or a 'her' depending on the voice?

If so, don't worry. It's perfectly normal. "We've seen long conversations ranging from talking about breakups to movies or even philosophy that people have had with Iris," admits Babu.

Siri does have a human face, though – voiceover artist, Weakest Link announcer, and Tap! subscriber Jon Briggs (@jonbriggs on Twitter). How does he feel about his voice becoming our digital butler?

Jon briggs

JON BRIGGS: Meet Jon Briggs, the real face of Siri's English (United Kingdom) voice

"I love it," he exclaims. "I love the fact that I have been chosen to be part of people's everyday lives, and especially by a company that creates brilliant technology."

Briggs didn't record his voice specifically for Siri, though – Apple licensed an existing character, 'Daniel', previously used in both Garmin and TomTom sat navs.

The one recording can handle multiple jobs due to being based on individual phonemes (the smallest parts of sound, of which there are 44 in English) and other important parts of the language, rather than specific pre-built statements such as 'turn left'. Combined, these pieces can create more or less any sentence you need.

"We recorded over three weeks – about three hours at a time, then topped up with anything they were missing after it was all analysed," Briggs explains. "The sentences were read as flat as possible, only with intonation where indicated, and no pausing unless there was punctuation. Not as easy as it sounds. Pick a sentence and read it out loud and your pauses won't often fall exclusively where the punctuation is."

As with voice-to-text, it's a technology with a good way to go before it becomes completely reliable, but while Siri may occasionally sound a little sarcastic or irritated, its voices aren't unpleasant to listen to in the long run.

Which of the voices does Briggs himself use? "Which one do you think?!" he answered. We wonder if he ever thanks it…

The next big leap

shazam

SHAZAM: Not all natural-input is command based. Shazam can guess almost any music track after just 30 seconds

What all this should demonstrate is that good natural input isn't simply a question of making individual apps that do everything, but creating pieces that can be combined into many forms.

If you want to make an augmented reality system devoted to turn-by-turn walking instead of driving, for instance, you don't have to reinvent the wheel. You know your user will have GPS built into their phone and that you can tap into it, you can give it a professional voice far superior to anything you might whip up yourself, and so on.

When apps can share what they know as easily as they now tap into our Twitter feeds, expect greatness. The catch is that, for now, development is still largely restricted to a bubble. Only Apple can attach data sources and apps to Siri, for example, with everyone else reduced to half-hearted hacks such as using CalDAV calendars to sneak in round the side.

Bouncing between 10 different apps based on what you want to record/look at/scan/find is already frustrating, and is largely self-defeating. Apple, Microsoft, Google… no one company is ever going to create a perfect, all-encompassing natural input system on its own. It's just too big a job.

It's firmly Apple leading the charge, though, and for a glimpse of the future, you can't do better than the iPhone 4S. Siri is at least a top-tier assistant, and no other phone boasts as wide a selection of companion apps, or the same seemingly genuine intelligence.