During Google I/O on Tuesday, Google almost casually dropped one of the biggest evolutions in real-world AI of the last decade by introducing Google Duplex – a service that allows a machine to conduct natural conversations with businesses (and likely in future, customers) in order to complete basic transactions. In an absolutely stunning demonstration (opens in new tab), Google demonstrated the Assistant calling a real, untrained, unknown, hair salon to book an appointment on behalf of a user – complete with a natural sounding voice, colloquialisms, and the ability to react and process messy or low-fi speech in real time.
There have only been a few times I’ve gasped when witnessing these sort of technology breakthroughs – face/audio swapping was a recent example – but Duplex takes things to a new level since the entire process is completely automated.
Duplex doesn’t require special hardware or software on the business end, it doesn’t need training or subscriptions, and it links into the existing infrastructure Google already has through Android and Google Home. Since Assistant already knows most of your information – your location, name, address, gender, contacts – this information can be instantly parsed and transferred without confirmation.
Google has stressed that Duplex isn’t some new form of SkyNet – it requires intense training in understanding what sort of task it is completing (booking a haircut might require knowledge of hundreds of different services, or a restaurant offering different seating or cuisine options) and it’s not capable of running full conversations outside of its brief.
But if anything, this is simply a tiny hiccup – like any 1.0 tech – that will be overcome over the coming months and years as more and more information is fed into the algorithm as it widens in scope. The biggest breakthrough here is Google’s ability to perform extraordinarily natural speech, reacting not only to complex sentence structures but also broken and unclear responses.
Duplex is polite, but not too polite. It’s a little wordier but not too wordy. It knows the importance of acknowledging and confirming statements and clarifying information by repeating it in a manner that a normal computer would deem unnecessary. Google says that this is due to training based on a ton of anonymised phone call data on top of a recurrent neutral network, which can predict billions of possible outcomes in seconds, removing the confusion that almost anyone who has used a “smart” IVR (Interactive Voice Response) system when contacting a call centre has encountered. Thanks to this, it knows that people expect a quick response to “Hello” but expect a pause after a question is asked.
Interestingly, Google trains its robots like a call center trains its staff – experienced operators sit on outgoing calls to monitor performance of a specific task type and provide feedback so the robot can improve its quality.
In many ways, Google expects people will use the service in a manner that “timeshifts” their needs to make or change appointments – if Assistant cannot use an API to digitally create the booking, it will attempt to call during opening hours and alert the user once the appointment has been made. It also claims that this will improve the ability of those with hearing or other disabilities to complete these tasks that normally require a third party.
This is probably the first function that many of us expected a true “digital assistant” to do for us. We tell it we want a haircut at 4pm next Thursday, and it’s done. There is no followup required – the booking is confirmed, and we can even listen to the call to double check it went the way Assistant claimed. But at the same time, it doesn’t require a cynic to see the problems that can follow if a technology like this is licensed out into the wild without an “overseer” such as Google taking responsibility for it. Spammers would rub their hands with glee as they imagine a natural speaking robot calling millions of people simultaneously without the cost or trouble of an underground South-East Asian call center.
Not all roses
What about the problems in relation to identity theft? False conversations flooding businesses to grief or cripple them? Who takes responsibility for a robot requesting a service or product that could intrinsically hold someone to a financial burden, or cost a business money due to a false or fraudulent attack? Then there’s the issue of outsourcing human interaction, forcing people in retail to do battle with bots all day on top of the dozens of other struggles that come with a demanding, tiring position. I also noticed that Duplex isn’t even really that polite; it doesn’t say please or thankyou, instead omits those awful “uh huh” and “mmhmm” Americanisms that are dismissive at least and rude at most.
I have no doubt that many of the basic issues will be solved with time – the spam, the tracking, the few still quirky elements of the language – but what is troubling to me is that small businesses aren’t the beneficiaries of this. Nor are the workers answering the calls. There is a human element to calling a business and making a connection, and Duplex largely dissolves this down in the same manner that a website would. Plenty of businesses don’t offer digital options because they prefer to offer that human element – it’s part of the customer service. Those customer relationships build word of mouth and relationships that grow their businesses.
Yes, sure, not every single phone call to a business is necessary – ordering food or a taxi is preferable via an app – but in many cases that call you make can give you an indication of what you are in for. A friendly, helpful staff member can remember you when you arrive, answer questions you have and offer unique add-ons. It’s possible Google will be forced to tag calls in a manner that identifies them as bots and staff will just hang up.
I have no doubt that Duplex will evolve into G Suite, where businesses can utilise it in reverse – to schedule appointments with customers, to return messages or remind them of upcoming appointments. A smart system would be in turn able to answer questions or modify bookings in real-time, again reducing that relationship into a transaction. But in which case, Duplex will essentially be a closed loop – calling itself to have weird and creepy conversations a million times a second, while we take one extra step towards removing another “annoying” social interaction from our lives.
This is a benefit to this software, and I have no doubt it will be popular. I just feel that the benefits are wholly for Google and it’s customers, rather than the businesses it forces to use its platform, whether they want to opt-in or not.