Google Play icon

Lip-syncing thanks to artificial intelligence

Posted August 28, 2018

Dubbing films could become significantly easier in the future. A team led by researchers from the Max Planck Institute for Informatics in Saarbrücken has developed a software package that can adapt actors’ mouth movements and whole facial expressions to match the film’s translation. The technique uses methods based on artificial intelligence and could save the film industry a considerable amount of time and money when it comes to dubbing films. The software can also correct the gaze and head pose of participants in a video conference to boost the impression of a natural conversation setting.

Film translators and dubbing actors work within a set of rigid limitations. After all, they must ensure that the words they put into actors’ mouths not only accurately reproduce what was said but also correspond to the actors’ lip movements and facial expressions. Now, an international team led by researchers from the Max Planck Institute for Informatics has presented a technique known as Deep Video Portraits at the SIGGRAPH computer graphics conference in Vancouver. This technique does away with the need to synchronize the translated audio track with the facial expressions in the video footage. Instead, the software can adapt the actors’ facial expressions – and above all their lip movements – to match the translation.

Synchronized facial expressions: a person’s facial expression, gaze direction, and head pose (input) can be transposed onto another individual (output) using the Deep Video Portraits technique, which works using 3D face models (centre). Credit: MPI for Informatics

The software was developed by a team involving not only the Max Planck researchers in Saarbrücken but also scientists from the University of Bath, Technicolor, the Technical University of Munich (TUM), and Stanford University. In contrast to existing methods, which can only animate the facial expressions found in videos, the new technique also adapts the head pose, gaze, and eye blinking. It can even synthesize a plausible static video background if the head moves.

The technique could transform the visual entertainment industry

In order to reproduce features realistically, the researchers use a model of the face in conjunction with methods based on artificial intelligence. “We work with model-based 3D face performance capture to record the detailed movements of the eyebrows, mouth, nose, and head position of the dubbing actor in a video,” explains Hyeongwoo Kim, a researcher at the Max Planck Institute for Informatics. “It works by using model-based 3D face performance capture to record the detailed movements of the eyebrows, mouth, nose, and head position of the dubbing actor in a video.”

For the time being, the research merely demonstrates a new concept, and the method has yet to be put into practice. However, the researchers believe that the technique could completely transform sections of the visual entertainment industry. “Despite extensive post-production manipulation, dubbing films into foreign languages always presents a mismatch between the actor on screen and the dubbed voice,” says Christian Theobalt, who leads a research group at the Max Planck Institute for Informatics and played a key role in the current work. “Our new Deep Video Portraits approach enables us to modify the appearance of a target actor by transferring head pose, facial expressions, and eye motion with a high level of realism.”

More natural conversation settings in video conferencing

As well as a realistic rendering of films into other languages, the method also has a range of other applications in film production. “This technique could also be used for post-production in the film industry, where computer graphics editing of faces is already widely used in today’s feature films,” says Christian Richardt, who participated in the project on behalf of the University of Bath’s motion capture research centre CAMERA. One example of this type of editing is The Curious Case of Benjamin Button, where Brad Pitt’s face was replaced with a modified computer graphics version in nearly every frame of the film. Until now, interventions such as this often required many weeks of work by trained artists. “Deep Video Portraits shows how such a visual effect could be created with less effort in the future,” says Richardt. With the new approach, the positioning of an actor’s head and their facial expression could easily be edited in order to subtly alter the camera angle or framing of a scene and thus to tell the story better.

In addition, the new technique could also be used in video and VR teleconferencing, for example, where people typically look at the screen and not into the camera. As a result, they don’t appear to be looking into the eyes of their conversation partners on the other end of the video link. With Deep Video Portraits, the gaze and head pose could be corrected to create a more natural conversation setting.

Neuronal networks detect videos that have been edited

The software paves the way for a host of new creative applications in visual media production, but the authors are also aware of the potential for misuse of modern video editing technology. Whereas the media industry has been editing photos for many years, it is now increasingly easy to edit videos – and with increasingly convincing results. Given the constant improvements in video editing technology, we must also start being more critical about video content, just as we already are about photos, especially if there is no proof of origin, says Michael Zollhöfer from Stanford University. “We believe that the field of digital forensics should and will receive a lot more attention in the future to develop approaches that can automatically prove the authenticity of a video clip.”

Zollhöfer is convinced that, with better methods, it will be possible to spot modifications of this kind in future, even if we humans might not be able to spot them with our own eyes. This issue is also being addressed by the researchers who presented the new video editing software. They are developing neuronal networks that are trained to detect synthetically generated or edited video with high precision in order to make it much easier to spot forgeries.

The scientists currently have no plans to make the video modification software publicly available. Moreover, they say that any such software should leave watermarks in videos in order to clearly mark modifications.

Source: MPG

Featured news from related categories:

Technology Org App
Google Play icon
83,898 science & technology articles

Most Popular Articles

  1. Efficiency of solar panels could be improved without changing them at all (September 2, 2019)
  2. Diesel is saved? Volkswagen found a way to reduce NOx emissions by 80% (September 3, 2019)
  3. The famous old Titanic is disappearing into time - a new expedition observed the corrosion (September 2, 2019)
  4. Moisturizers May Be Turning Your Skin Into "Swiss Cheese" (August 19, 2019)
  5. The Time Is Now for Precision Patient Monitoring (July 3, 2019)

Follow us

Facebook   Twitter   Pinterest   Tumblr   RSS   Newsletter via Email