Broadcasters and movie studios alike are starting to explore the huge potential of modern technologies to bring a new generation of filmed entertainment to our TV sets and cinemas. Artificial intelligence, machine learning, and deep learning are the buzzwords that excite video executives with promises of revolutionary new abilities for video creation and editing.
Deep learning, in particular, is the new frontier for the video industry, allowing video professional to do things automatically that would have taken weeks of work in the past, as well as some things that wouldn't have been possible at all. How is deep learning different from other machine learning algorithms? And what are its practical applications for broadcasting and filmed entertainment? What are the science and its business ramifications?
Max Kalmykov is the VP of Media and Entertainment at DataArt.
Artificial Intelligence, Machine Learning, and Deep Learning
Artificial intelligence is any attempt to make a computer appear as though it has intelligence. The computer may be told exactly what to do in any given situation, in which case it hasn't learned anything. Machine learning seeks to allow the computer to learn how to perform certain tasks. There are a variety of methods to do this, and nearly all of them rely on the computer altering parameters repeatedly through a trial and error process. One of the more complex ways of doing this is by mimicking the neurons in a biological brain. When we make these artificial brains, or neural networks, more complex, we have deep learning.
Deep learning allows a computer to take something complex as input, such as all the pixels in a frame of video, and output something equally complex, such as all the pixels in a new, altered, frame of video. For example, it may be shown frames with unwanted grain as input, and have its output compared to clean frames. By trial and error, it learns how to remove the grain from the input. As more and more images are passed through it, it can learn how to do the same thing for images that it was never shown.
Perhaps the first impressive use of deep learning was when Google trained a neural network to play Go, the famously difficult and complex board game. The game is far too complex for human instructions to create a viable opponent, and a single layer neural network would have never been enough. Deep learning made it possible.
Deep learning is used for a wide variety of other tasks as well. It is used to match generated speech with human speech, so text-to-speech programs sound more natural. In a similar task, it is used by translation companies to teach computers how to translate from one language to another. The self-driving cars that several companies are working on are driven by deep learning. Marketing departments use it to learn the habits of customers and guess how a given customer will behave and what strategies they will best respond to. Digital assistants use it to better understand the requests that we make of them.
Deep learning for TV and Filmed Entertainment
There are many opportunities to apply deep learning techniques in the field of video production, video editing, and cataloging. But the technology is not limited to automating repetitive tasks; it can also enhance the creative process, improve video delivery and help preserve the massive video archives that many studios keep.
Video Generation and Editing
Warner Bros. recently had to spend $25M on reshoots for 'Justice League' and part of that money went to digitally removing a mustache that star Henry Cavill had grown and could not shave due to an overlapping commitment. It isn't just 'Justice League' – the post-production stage of any movie is time-consuming and expensive. Deep learning will be a game changer for these are types of tasks.
Consumer-grade, easy to use solutions such as Flo allow you to use deep learning to automatically create a video by describing what you want in it. The software will find the relevant videos from your library and edit them together automatically.
Google has a neural network that can automatically separate the foreground and background of a video. What used to require a green screen can now be done with no special equipment.
Deepfakes have hit the news quite a lot recently – when the face of one person is put onto a video of another, likewise, deep portraits which apply motion to still pictures like the Mona Lisa. The potential uses of this technology in special effects are vast.
For example, the mustache problem over at Warner Bros which drew Henry Cavill into a controversy with fans. Cavill needed to grow a mustache for Mission: Impossible - Fallout, and at the same time needed to reshoots for Justice League. Cavill, had a mustache for Fallout, but needed to be clean- shaven for Superman. He opted to keep the mustache, so the Justice League’s editing team had to digitally remove the hairy lip for every scene he’d reshot.
Sadly, this was noticed by fans and it caused a fuss. If hobbyists working at home can put Nicholas Cage into movies that he was never in using deep learning tools, one can only guess how much time and money Warner Bros. could have saved replacing Henry Cavill with older footage of himself.
According to the UCLA Film & Television Archive, nearly half of all films produced prior to 1950 have disappeared. Worse, 90% of the classic film prints that do exist are in poor condition. The process of restoring these films is long, tedious, and expensive. This is an area in which deep learning is going to make a major difference.
The process of colorizing black and white footage has always been lengthy. There are thousands of frames of footage in a movie and coloring each one takes a long time. Even with advanced tools, the process can only be automated so much. Thanks to Nvidia, deep learning can now speed up the process significantly, with tools that only require an artist to color one frame of a scene. From there, the deep learning network automatically handles the rest.
A previously show-stopping problem was missing or damaged frames from a video. You can't do reshoots on something that happened years ago.
Restoring that type of movie before meant editing around the missing frames. Now, deep learning networks from Google aim to change that. They have developed a technology that can realistically recreate part of a scene based on start and end frames.
By detecting the faces of everyone in a video, deep learning can allow you to quickly classify a video collection. You could, for example, search for any clip or movie that has a given performer. Alternatively, you could use the technology to count the exact screen time for every actor in a video. Sky News recently used facial recognition to identify famous faces at the royal wedding.
The technology is not limited to detecting just faces though, sports broadcasts rely on camera people to track the movements of the ball, or to identify other key elements to the game, such as the goal. Using object recognition, AI-powered tools can be used to automate the production of a sports broadcast.
While Flo can identify what a scene is about and use that data to generate a video about whatever you want, that same technology can be used to sort and classify videos to make it easy to find a particular piece of footage by simply searching for people or actions that appear in it.
This could be used to detect and remove objectionable content from videos to ensure that they remain suitable for a target audience. In a similar vein, it could be used to match new videos up with old videos that a person has shown interest in and provide them with a personalized recommendation list.
As we move into 4k streaming, and television manufacturers begin the rollout of 8k displays, streaming is using more data than ever before. Anyone with a poor connection knows what a problem this can be. The utility of a shiny 4k display is weakened if your internet connection can't handle the bandwidth to fully take advantage of it. Thanks to neural networks that can recreate high definition frames from a low definition input, we could soon be streaming low definition streams over our internet connection, while still enjoying the high definition glory that our displays are capable of.
Deep Learning use in film and broadcast has only begun to nibble at the edges of what it will be used for in the future. I believe its future in the video industry is particularly bright. However, as with all new technologies, deep learning is not without a downside. As with deepfakes or face recognition misuse, there are valid concerns of privacy and trust that arise from the rapid evolution of this technology.
As with any new technology, the industry needs to address a range of issues. The video industry and tech experts must come together to develop the standards of how tomorrow’s new normal might look. However, with the right approach, the benefits of this addition to the toolbox will be bigger than is imaginable now, and, just as the advent of “talkies” and color film did before it, deep learning will take film and television to a whole new level.
Max Kalmykov is the VP of Media and Entertainment at DataArt.