Inside Flame’s new machine learning-powered human face segmentation

Flame

Under the hood of the semantic keyer in Autodesk’s Flame 2021

Recent developments by Autodesk in its compositing, color grading, and finishing tool. Flame has taken advantage of machine learning, especially for object isolation, sky extraction, and rotoscoping.

In the just-released Flame 2021 update, this has been extended to specific features for isolating human bodies and faces. It used for color grading, lighting VFX, and compositing. It’s all via the Human Face Part Extraction Keyer. This enables feature isolation in order to extract alpha mattes for things like skin, eyes, lips, nose, cheeks, chin, and laugh lines. The idea is to save time in compositing involving faces and in doing cosmetic beauty work.

We go behind the scenes of the new toolset with Flame family product manager Will Harris. He runs down how the Keyer came about, what machine learning ‘training’ was involved to make it possible. And where artists can use it. He also outlines some of the other new features in Flame 2021. Including a GPU-accelerated defocus effect, finishing enhancements, and new workflows for Dolby Vision HDR authoring and display.

What the keyer does

The Human Face Part Extraction Keyer lets you isolate specific face parts – features like the cheeks, chin, forehead, nose, t-zone – and give you a matte for them. In the keyer, there’s a drop-down list of the things it can detect automatically.

“There’s also,” says Harris, “a custom UV layout tool where you can draw on a static template the area of concern, say a laugh line or a mole, and then it uses the whole face track to give you a matte for the mole or matte for the laugh line.”

Where Autodesk believes artists will be able to use the Human Face Part Extraction Keyer is in two major areas; beauty work and in color grading.

“In terms of color grading,” suggests Harris, “it will be a huge help for being able to say, ‘I’ve got this face where the person’s got bags under their eyes in every shot’ or ‘I just wish they had more definition or more shadow on one side of their face.’ The artist can now get a formula going on one shot and then drag and drop that onto another shot and it does the same exact thing. It’s basically like lighting VFX at the speed of color grading.”



Where this differs from what artists would need to do currently to isolate parts of the face is in removing a lot of the manual process.

“You’d normally have to hand-build the shapes and use masks and keys,” says Harris. “That’s OK, of course, but the problem is if you then change your mind because, say, all those masks might be for the left side of the face and they’re tracked to a specific shot. This new tool has the intelligence to understand where the face is and the angle of the face on every frame. It automatically gives you a matte that is portable onto other shots.”

Where machine learning fits in

For a number of years, Autodesk has been developing different solutions for image segmentation or isolation with things like depth tools and face normals. It led, via the adoption of machine learning techniques, to the semantic sky extraction keyer for automatically generating sky mattes.

The natural progression, says Harris, was looking at the human head and body to isolate the head and neck and the whole body (something implemented into Flame last year). It has now moved on to face part extraction.

So how does this machine learning side of the keyer work? Well, it starts with training from a large dataset. For example, here’s how it worked with the sky extraction keyer, as Harris explains.

“You start with a bunch of stills where people have taken the trouble to hand-paint the depth, or in some cases used an actual depth generating camera, to give you a meaningful Z-depth that’s accurate for a particular building or scene. The idea is to have sky at the top, ground at the bottom and at the front, and some sort of ‘blobs’ in the middle. You then feed it through a neural network or a logic tree which asks, ‘Does it have this? Does it do this? OK, here’s the output.’”

The data also came from other places, including Cornell University’s MegaDepth project and MIT’s semantic scene parsing project.

“Then,” says Harris, “there’s a whole compute process, which literally hijacked half our office to run the processing across multiple machines at night for weeks at a time when we were developing this. It produces this piece of code, which is like a magic algorithm that’s unique to this training set. That piece of code is small and can run on a single GPU and can run quickly for an HD frame. It looks to previous reference based on the new scene you provide it, and predicts what matte should be generated.”

In terms of face part extraction, a training model that involves a huge dataset of faces was also used. “We needed to get hundreds of thousands of faces and normal maps references. Then be able to give the system a brand new face from live-action and get a predicted normal map,” details Harris. “And even though there are people out there generating depth maps for buildings, there are not enough people doing that for human faces.”



The solution Autodesk came up with was to use one of its internal tools, Autodesk Character Generator, as part of the training process.

Says Harris: “We ran a script that output a hundred thousand variations of beauty images and normal maps – all different angles, and different types of faces. From that we had our dataset and we fed that into an algorithm and it then was able to predict a predicted normal from a live action face, even one that was just filmed with say an ARRI Alexa camera, i.e. with no depth information. You give it enough sources and enough references and then it can make its own references. By having the automatic isolation, you’re straight into the creative work.”

What’s also new in Flame 2021

In addition to the human face segmentation offerings, the latest Flame includes a few other key developments. Some of which are set out by Harris here:

GPU-accelerated Physical Defocus effect – this is our latest greatest tool for doing high-quality rack de-focus. With an infilling algorithm that allows you to do severely out of focus foregrounds with a nice clean background, which has always been hard to do.

Using Flame more for color grading – artists have started using Flame in more of a colorist persona. So we’ve included things like a ‘grade bin’ where people can store their work and quickly apply it to other places.

Dolby Vision HDR authoring – a lot of people, especially higher-end customers, have been using a separate pipeline. Where they’ll say, conform in their Flame, they’ll do grading in DaVinci or FilmLight Baselight. Then they’ll come back to Flame, for finishing. We thought we would improve the workflows for Dolby Vision HDR authoring and display in this new version.

If you’d like to learn more about the new features of Flame 2021, check out the release notes.



Sponsored by Autodesk:
This is a sponsored article and part of the befores & afters VFX Insight series. If you’d like to promote your VFX/animation/CG tech or service, you can find out more about the VFX Insight series right here.

One Reply to “Inside Flame’s new machine learning-powered human face segmentation”

Leave a Reply

%d