Ubisoft's AI lip-sync tech could have applications beyond gaming

We’re sure we’re not alone when we say that when someone mentions AI in games, our minds immediately jump to NPCs and enemies; their movements, their reactions, their, well, intelligence. All of it is part of a game’s AI.

But, as it turns out, AI is so much more than that in games. After a conversation with Yves Jacques, the head of Ubisoft’s La Forge technology research project, we gained a much better idea of how AI is being used to make game development faster and more affordable, as well as improve the immersive quality of the games we play.

We asked Jacquier to tell us about some of the prototypes being developed and used by Ubisoft and La Forge at the moment. One AI-powered project he was particularly excited to discuss was Sound Matching. This, he tells us, ”takes the wav file directly from the recorded actors and drives the facial animation” of in-game characters.

Improving facial animation

Without AI there are a few ways to do facial animation, Jacquier explains. One way is manually, the process of which can also be accelerated using motion capture, and another is to break down text into phonemes, or distinct units of sound. Using these you can form different lip movements, record voice actors speaking, and then map the voice with the phoneme and the different poses of the animation.

Aside from sounding utterly exhausting, Jacquier tells us a big problem with using the latter method is that it’s mainly only useful for English as “you have to have huge databases to transform text into phonemes and not all of them are reliable”. Even more than that, ”you’re not able to transform things like barks and coughs, things like that”.

“We get an increased quality, an increased sense of immersion for the gamer and it has diminished the expected localized animation costs by around 30%“
Yves Jacquier

This limitation creates some problems when trying to localize games into languages other than English: “we had a lot of work to do when we wanted to localize our games because we are recording our baseline in English, then we are recording all the other languages and then we have to lip sync to synchronize each sentence through the English animation”.

Not only does this “create a lower quality experience, a lower sense of immersion if you’re playing in another language than English”, it also costs a lot since you’re “asking people to spend a lot of time manually aligning and synchronizing foreign sentences with English animation”.

When we hear about the amount of work that has to go into something as seemingly simple as the movement of an avatar’s lips, we’re not surprised that Jacquier is so enthused by the capabilities of his Sound Matching AI.

Lower cost, higher quality

“It works with sound waves,” he says, which means it “works in any language and it creates the lip animation directly”. What this results in is “an increased quality, an increased sense of immersion for the gamer, and it has diminished the expected localized animation costs by around 30%”.

When AI is allowing you to make such significant savings in both time and money during the development process it means, Jacquier says, “we can put [those efforts] elsewhere to improve the quality of the game”.

What we get, then, is far more accurate facial movements when game characters are speaking. We’ve all noticed when playing games that there are moments when what a character is saying doesn’t match up with their lip movements at all. It’s not game-ruining by any means, and it’s only annoying in the way that a slightly off-center painting might be. But it does have an impact on immersion that Ubisoft would like to prevent.

Branching into other fields

What’s really interesting is that the work Ubisoft is doing with this prototype is also being used by other researchers in other fields, namely medicine, because they’ve found they share the same challenges.

“I did not know,” Jacquier goes on, “but in medicine they are using avatars to treat patients with schizophrenia or anxiety”. Jacquier was quick to tell us that he’s by no means a specialist in the area, but it’s his understanding that video game-like avatars are being used to “help patients confront their own voices in schizophrenia or fears for anxiety”.

Where the fields of medicine and gaming cross over here, then, is in their need for avatars to have believable facial animation. “If you start to not believe in a situation start not to believe in that you’re actually talking to the avatar as something real, the treatment will fail,” Jacquier adds.

”If you start to not believe in a situation start not to believe in that you’re actually talking to the avatar as something real, the treatment will fail”
Yves Jacquier

A great deal of the treatment revolves around conversing with an avatar, which means patients will be focusing on the lip movement, so medical researchers are looking for this part of the face in particular to be accurate.

“When we speak to someone, a lot of our attention goes to the mouth, especially when there’s noise or you’re trying to concentrate,” says Jacquier. ”So if the lip movement tells you something different from what you’re hearing then you lose the person.”

He tells us that those working on treatment plans like this have expressed a lot of interest in the AI Ubisoft is working on, “especially Stephane Guay, who is leading research in the Mental Health Institute that’s trying to include more technology to help patients, not only medicine and regular treatment that exists“.

At the moment, Jacquier and his team are still trying to work out exactly how they will collaborate, but there is a hope that whatever route they follow will “improve our games, and their treatment”.

E3 is the world's largest exhibition for the games industry, stuffed full of the latest and greatest games, consoles, and gaming hardware. TechRadar is reporting live from Los Angeles all week to bring you the very latest from the show floor. Head to our dedicated E3 2018 hub to see all the new releases, along with TechRadar's world-class analysis and buying advice about the next year in gaming.

Emma Boyle is TechRadar’s ex-Gaming Editor, and is now a content developer and freelance journalist. She has written for magazines and websites including T3, Stuff and The Independent. Emma currently works as a Content Developer in Edinburgh.