A new algorithm that can create realistic looking video of speech from audio files has been revealed in a paper published by the University of Washington.
The paper states that the technique can be utilized in order to allow video conferencing platforms like Skype to operate even when the user’s bandwidth isn’t high enough to support video. While we can see how it could work with that application, it is far easier to see how this technology could be used as a manipulative tool.
What’s interesting is that the team decided to use President Obama as the test subject. According to the paper, the reason for using Obama is the vast amount of high-quality footage that exists of him online that is free to use. A large amount of ‘stock footage’ is needed to create the videos.
You can see a sample of the footage created below:
The algorithm is trained using hours of footage of Obama speaking, tracking his vocal patterns, and physical mannerisms like: “his head stops moving when he pauses his speech (which we model through a retiming technique)” in order to recreate realistic speech.
The obvious problem that this could be used to create ‘fake news’ is only amplified by the use of Obama. Obviously, photoshopping images to manipulate events is nothing new. Take this image of Trump and Putin that did the rounds recently:
Photoshop of the day #allalone #G20 #Putin #Russia #Trump pic.twitter.com/qFpQ1RiiZQJuly 9, 2017
That image went viral, and it’s not even a very good photoshop. The videos created by the team at the University of Washington on the other hand, are scarily convincing.
According to the paper: “Creating a photorealistic talking head model – a virtual character that sounds and appears real, has long been a goal both in digital special effects and in the computer graphics research community.”
There are obvious applications for the film industry. With the advances in this field of technology, we have obviously seen an increase in the amount of dead people that have been digitally ‘resurrected’ to appear in movies and adverts.
As the person is no longer alive, it becomes an issue of who ‘owns’ that person’s image, which is uncomfortable enough, but there is something deeply uncomfortable about seeing a real person’s image being manipulated in this way. With people that are still alive, obviously their permission would need to be obtained to use their image. Wouldn’t it?
The fact that the Obama footage is in the public domain shouldn’t mean that he, as a person has no ownership of his image and how it’s used. Where this runs the risk of becoming even more of a moral and legal minefield is on video sharing platforms.
If you upload footage to a social media platform that then owns copyright of that video, do they have the same rights that the researchers had to use images of Obama? Would it be possible for them to use video of you, that you didn’t create?
The team is obviously aware of the possibility for this to be abused, and was keen to stress that it didn’t use it to make Obama say anything he didn’t say elsewhere. In a press release, one of the scientists behind the paper, Steve Seitz said: “We very consciously decided against going down the path of putting other people’s words into someone’s mouth.”
Which is very noble of them, but there is absolutely nothing that says that others getting their hands on this technology will be as honorable. As with many emerging technologies at the moment, there is the possibility that this is a knee-jerk reaction to sudden unsettling changes.
There's every possibility that this technology will be the thing that allows for us to create lifelike avatars of ourselves in virtual reality. One of the authors of the paper, Ira Kemelmacher-Shlizerman, lists one of her highlights on her UW page as "My company dreambit.xyz acquired by Facebook". The same Facebook that owns the Oculus VR platform, that is.
The paper is going to be presented August 2 at Siggraph, and it will be interesting to see how the researchers respond to the public response to their findings.
- Want to read more about ownership of our image in a digital world? Check out Gerald Lynch's column: How a Craig David gig filled me in on the power of augmented reality