Computer Vision Technology that won an Oscar!
“Computer graphics has simplified the life of movie directors. We can now create realistic models of vehicles including spaceships and submarines, skyscrapers, metropolises and effects such as water ripples and fire.”
About the Expert
There are Oscars for science and technology as well! Prof. Parag Havaldar, a computer science alumnus of IIT Kharagpur, brings this to light with his recent achievement. He won the Oscar Technical Achievement Award for technical development of expression-based facial performance capture technology at Sony Pictures Imageworks.
“Expression based performance capture technology” a string of difficult words at the first glance but read it twice and it’ll make sense. It is basically the digitalisation that made possible movies like Avengers, Spiderman 3 and Avatar.
When manually shooting expressions in tough settings like rough terrain, feeble light and water began to take a toll on film producers’ pockets, directors began resorting to computer programs to ease the job.
The Beginning
Remember Gollum from ‘The Lord of the Rings’? Actor Andy Serkis enacted the memorable character and started the revolution in creating digital characters that engage audiences emotionally. Sony Pictures Imageworks was the first system to venture into improving this error-prone process. The Imageworks Academy went through a lot of research for fifteen years and was finally awarded in 2017. Led by a dearth of resources, collaborations with departments had to be done leading to a camaraderie between tracking teams, integration teams and animation teams. Prof Havaldar is grateful to his project management and control teams for their integration.
The Working
Vectors. They are entities of linear algebra. In digitalisation of actors’ faces, the vectors contain information about different expressions and the superposition of them is the movement of the visage.
First, an actor’s face is scanned in a variety of expressions with appropriately positioned video cameras to analyse the signals and translate them onto a digital face. This is made possible with 3D construction, tracking and optical flow. The actor continues to act as directed to develop a 3D facial model. The FACS (Facial Action Coding System) by Paul Eckman lends the aspect of psychology and medicine to the structure. It is the standard basis for representation of human facial emotion.
All this helps create a co-ordinate space of facial vectors where each expression is a sum of certain basic facial expressions. For example, an expression of awe could be the sum of an expression of respect plus an expression of fear. Or an expression of adventure can be displayed as one of fun added to one of danger.
The Scope
Computer graphics has simplified the life of movie directors. We can now create realistic models of vehicles including spaceships and submarines, skyscrapers, metropolises and effects such as water ripples and fire. This is evident in successes such as ‘The Polar Express’, ‘Monster House’, ‘Starwars’, ‘Beowulf’, ‘Hancock’, ‘Planet of the Apes’, ’Watchmen’, ‘The Curious Case of Benjamin Button’ and ‘Warcraft’. Real actors can be made to look younger or older and dead actors can be digitally resurrected. Groundbreaking performance of actors has enabled us to produce stylised faces, humanoid faces and real faces on screen.
Experience in Hollywood
Prof. Havaldar recounts fond memories with Sir Anthony Hopkins to satisfy his curiosity about how captured data was processed to obtain the final product on screen. He has also interacted with Mark Strong on the sets of ‘Green Lantern’ and agrees that the actors’ talent and understanding of the production makes the process go a long way.
Oscar winning Parag Havaldar has helped actors understand the goals of his technology so that the director can comfortably ease them into an award-winning performance. The calibration requires guiding actors through a series of expressions as they get into the skin of their characters.
The Challenges
While Prof. Havaldar’s technology has become crucial to engaging the audience in a cinematic movie experience and allowing them to suspend disbelief to get emotionally immersed, it’s among the hardest problems of computer graphics. Creating realistic digital faces in animation depends on how faces move in their natural timing because our perception of years makes us judge the correctness of the observed facial expressions. The smallest mistake in timing and movement causes suspicion in believing the face is real.
The Solution
With an expertise in computer graphics and computer vision, Prof. Havaldar was able to analyse an actor’s performance in body and face to translate the movement and emotion to a digital actor on a large scale while simultaneously preserving the creative iterations artists provide towards final animations. He borrowed formulations from behavioural science, medical anatomy, psychology and the artists’ animation workflows.
Taking it to 3D
Each of our eyes has a slightly different view of the same object and our mind discerns it in 3D. In films, two different images for every frame are made and shown to the left and right eye individually at a very rapid rate. The persistence of vision helps us to perceive stereoscopic video.
Technologies like IMAX 3D and RealID filter left and right images with directionally polarised glasses handed out in theatres. There are two ways to go about the sequence. The first is to capture two video streams with a stereo camera rig and the other is to capture one stream but reconstruct the second stream in the post.
The issue faced with a rig is that it is bulky and vibrates, causing spatial disparities. Each of the two cameras on the rig responds differently to colour and reflectances. This has to be done without ever knowing how the stereo effect will play out in theatre and might cause displeasing stereo experiences for audiences. ‘Ninja Turtles’ and ‘Blue Alien’ have used virtual puppets which need a rig for adaptation.
Though capturing just one stream allows for creative control of spectroscopic depth perception while making the second stream in the post, it is great manual work of image segmentation. This problem is yet to be solved through appropriate algorithms.
The Motivation
As a student, the professors and mentors that Prof. Havaldar had in the University of Southern California had research interests from fundamental signal-processing to high-level cognitive brain models. Digitalisation influenced entertainment in Los Angeles, a place where this industry thrived.
After a brief tenure in gaming platform softwares like Sega and Nintendo, he joined Sony Pictures as a software engineer. He found a visionary in Robert Zemeckis whose ongoing project ‘The Polar Express’ boldly adopted performance capture technology. Multiple vision cameras recorded actors and then translated the performance to a digital character on screen.
Scope for the Future There is barely anyone who’s not into gaming or at least not intrigued by its realistic imagery and interactive animations. Computer gaming technologies have rapidly advanced with faster GPU platforms and hardware like Sony Playstation, Xbox etc. The key to the massive success of major league games like Call of Duty, World of Warcraft and Overwatch (which were brought about by Activision-Blizzard, where Havaldar currently works) was the realism of the environment created by the game.
For those who wish to venture into gaming, keep in mind that besides graphics, you should also explore the domains of natural language understanding, machine learning to understand speech and photorealistic rendering. Do not get bogged down by the dominant idea that haunts young minds, ‘Everything is already done and discovered. What is left for me to do?’ While technology has progressed fast, there is still scope for creating intelligent characters which don’t just mimic a pre-recorded performance but respond to stimuli based on trained data sets. With cognitive computational models of motion and emotion, you can make digitally ‘active’ characters digitally ‘alive’. You can make movies go from plain story-telling to story-defining.
Prof. Havaldar has a vision for digital immortality, where you can create your own digital avatars that look and behave like you! This will soon be commercialised with virtual reality and multiscopic holographic projections.
Message for Youngsters
Prof. Havaldar is a professor of Computer Science at his alma mater, University of Southern California where he pursued his PhD in computer vision and computer graphics after developing an interest in digital imaging at IIT Kharagpur. He believes youthful minds challenge him with the vibrant ideas they bring to the table.
He is of the opinion that while students abound in merits in mathematics and science, they must develop an experimental, investigative mind outside the classroom and beyond the textbook. Students interested in gaming, animation and film technology should go the extra mile to understand the link between technology and art. Instead of being overwhelmed by the digital aspect alone, students should take the effort to realise technology plays a subsidiary though respectful role in art. The realisation should dawn on you that no matter how advanced, technology always fails if an artist cannot use it to accomplish his/her art. The best technology is the one whose presence is never felt.