How 'Welcome to Chechnya' used A.I. and machine learning techniques to mask the doco's subjects

“For their safety, people fleeing for their lives have been digitally disguised.”
– The opening title card to the film.

The use of machine learning, A.I. and deepfakes are gaining a whole heap of traction in VFX right now. Which is one reason the documentary film Welcome to Chechnya has also garnered attention. It used these visual effects techniques to help protect the identities of those interviewed for David France’s film about activists working to help gay men and woman fleeing from persecution from the Russian republic of Chechnya.

After initially looking to filtering and rotoscoping and animation to cover the faces of his interview subjects, France turned to visual effects supervisor Ryan Laney and his team to ‘mask’ those on camera with the identities of ‘face doubles’ (the voices were also fabricated) using various techniques, including the machine learning ones. The idea was to preserve the emotional impact of the stories of these individuals without revealing who they were.

Here’s how they pulled it off.

b&a: I’m curious what you ‘called’ your face swapping effect in the end, but first, as a filmmaker, David, how did you come to choose the kind of approach you would take to mask the identities of the people in the film?

David France: I had, of course never had a need to disguise anybody before in a documentary. And I promised everybody who I was filming that I would disguise them to the highest standard, which was that even their own mothers wouldn’t recognize them. And I was sure I could do that. And it wasn’t until it came back that I realized that I was taking on something that was going to be quite a challenge. So the first thing that we started working with and keeping in mind Ian that all of this was being done in secret because we were protecting the footage potentially from state actors. We worked only on encrypted drives. We spoke only on encrypted lines. We created an edit suite that was air gapped so that none of the computers that worked on the footage had ever been on the internet.

So then we brought people into the studio to do this kind of roto-mation approach with pulling people out of the film, filtering them, putting them back in. And we discovered to our surprise that that looked good, but it didn’t disguise. And then we brought more people in, put more filters on the faces to expand the size of noses to alter the facial shapes, but it had the effect of actually making people more recognizable. It accentuated their noses and their foreheads and their ears, the things about them that their mother would know intimately. So we just failed over and over. We tried using kind of Snapchat-y kind of approaches, where we would put like glasses on people.

And we developed an entire language of masks where we tried like the Ninja Turtle masks. And we tried the stagecoach robber mask. And we tried a bunch of stuff that also didn’t disguise, or if it did disguise, it erased the humanity of the person and certainly didn’t fit or match the urgency and seriousness of the film. And that’s what brought us to Ryan. And Ryan had been thinking with his team about ways to do some of this stuff already. So we found a mind already engaged in this, and we were lucky we did.

Ryan Laney: We call them ‘veils’, just to answer that first question. So, David luckily had spent a good amount of time trying some other ideas. And even though they didn’t work, they were really valuable in giving us a message about what was working and what could work and where we needed to spend attention. We ended up drawing from academia and a few papers that have been written in previous years.

There’s a face research lab in Glasgow that has a really interesting faces around the world. And we are, in our human form, we’re very similar in a lot of ways, and we really kind of focused in on this idea that the changes we wanted to make were not big sweeping animation changes, but really subtle things. We talked about for a while, doing a bit like a prosthetic, like we were painting on new eyebrows or putting on different lipstick. That’s really where our mind was when we discovered the deep machine learning. And that allowed us to give it a lot of data from the film and from stuff that we shot to sort of pair up and be able to paint on this new veil.

The machine learning part of it really helps correlate the face shapes between the two different people and so the emotion comes through really well. I think what we learned through this process is how much we telegraph how we feel, not necessarily through face shapes, but through face motion. Even from our first test where it was working at a lower resolution, it was impressive how well the emotion came through and how honest and truthful we felt like the solution was.

b&a: Could you explain from perhaps both of your points of view what the workflow ended up being?

David France: We had been doing our R&D in our own studio while we were in post. And we found Ryan and his team as we were getting close to lock. And at that point we had no plan for distributing the film because we hadn’t found our solution yet. Once we saw what Ryan was doing and saw how successful it would be, we handed him a locked picture. We unlocked it after the effects came in, because it suggested a couple of things to us, but mostly it was a locked picture. And in that interim, Ryan had various approaches that he thought we could do it like this or this or that.

And we took them to another academic setting, to a neuropsych lab at Dartmouth, where the director of the lab is something of an expert in the Uncanny Valley. And we wanted to make sure before we pulled the trigger that we weren’t pushing people out of the film in any sort of kind of unexpected ways. And that’s how we settled on this final approach. She took it into a study, I think 200 people in the study, and confirmed that we were on the right track. And that’s when we said to Ryan, here’s our finished picture, take it from here.

b&a: And then what were your steps, Ryan, in actually completing this work?

Ryan Laney: Maxwell Anderson was the post supervisor on the production film side. He packaged up everything and all the original footage was actually delivered to us by hand on a drive for security reasons. So that was, I guess, unusual for today’s ability to transfer stuff over the wire. We produced a bunch of tests. Dr. Wheatley at Dartmouth College, who David was mentioning—we were concerned that the softness of the face would be a trigger for people. And very interestingly, our hypothesis was wrong in the sense that without a veil people felt more concerned for the person in the film, because they felt like they were at risk. And with the veil, the unsettling rating was slightly lower for the veiled person.

So we felt really, really good that we were going down the right track with this particular solution. And then, in just the footage we shot, I went through the entire film, Piers Dennis, the effects coordinator on our end, went through the entire film and selected all the people that we needed to cover. There were 23, and for each person, we looked at, how were they represented lighting-wise? So we had everything from bright daylight to dimly lit, under-lit by a car, sort of nighttime shots. And so with this matrix of people by lighting scenario, we set up a shoot in Brooklyn, New York and shot for a week to capture the data and all the scenarios that we needed. And then after ingesting that it was really just a matter of elbow grease, I think you’d call it.

b&a: Deepfakes and machine learning are obviously exploding right now. I’d love to talk a little bit about how much of it really was classic deepfake work. David, what has that been like seeing so much deepfake discussion out there and then having a film that taps into that? I find that quite fascinating that you really are right there on the cusp of the discussion.

David France: You know, I come to documentary filmmaking through journalism. I’m an old investigative reporter and journalism is very concerned about deepfakes, and rightly so. I think historians are concerned about it, this idea that you could create, out of nothing, historical falsehoods. And I’ve been to workshops where these things have been discussed and there’s been a lot of hand-wringing. I never saw this as being at all related to deepfakes. I think deepfake is the crime, not the technique. And when we first started talking about it with Ryan, we realized that, going further into journalistic ethics, we needed to do this very openly. We couldn’t deceive the audience. So we certainly weren’t deceiving the subjects. They had all agreed to do this conditioned on approval of the method that we were going to use.

And we weren’t deceiving the people who lent us their faces as human shields. They were mostly LGBTQ activists, most had already been involved in bringing attention to what’s happening in Chechnya. We cast them from their Instagram pages and kept sending images over to Ryan saying, will this one work for that one? Will this one work for that one, to see who would match, whose faces seemed to be ideal for it. And then we said to the audience, look, we’re bringing you in on this.

We emphasized that fuzzy edge around the face, that kind of halo around the face, to telegraph to the audience that what they’re looking at has been altered. We have a card in the front of the film saying they have been digitally altered. We worked on, I think Ryan said at one point it was like the first 20 shots in the film to overemphasize the halos on the face to teach the audience how to watch this. And so in that way, I don’t think it has anything to do with it. In fact, just the opposite. It’s a technique that is allowing a truth to be told that otherwise would have been repressed.

b&a: And Ryan, just on that deep fake implementation or the extent to which you used it, can you talk a little bit about that?

Ryan Laney: I think David’s said before that A.I. is the tool and deepfake is the crime. And, sort of differently, deep fakes are inherently nonconsensual. And so if you’re going to like define what a deep fake is, it’s when somebody in a video doesn’t know that somebody who was shot separately didn’t know that their face was applied, right, in an attempt to trick the audience. So kind of on all three of those conversations, we are being open and honest.

We use Google TensorFlow. We certainly looked to the papers that inspired deep fakes. There was a 2017 paper about semantic style transfer, which looked at how you could correlate between two datasets, understanding basically what was in the picture before you did the style transfer. Machine learning is being used a lot in a lot of different ways these days. And so we liked the idea of using a tool for good and for human rights and to help very specifically in visual effects. Our job is to support whatever story is there. And it has really been incredible to be able to support David and the story he’s telling that does affect human rights. So we feel like this is a, it’s a really powerful message that David’s crafted, and we’re just really happy to have helped him tell that story.

b&a: At Imageworks or ILM or elsewhere, had you done any face replacement or anything like this over the years?

Ryan Laney: Well, I think Ant-Man was the most recent face replacement, but you know, it kind of has different ideas that are going on there where you’re taking your key actor and you’re putting them on a stunt man. And we’re kind of doing the reverse. We’re putting the stunt man, the stunt person, onto the real person. Except that we’re also spending a lot of energy making sure that every micro-expression is there, which you don’t always get in a face replacement. In the normal face replacement, you’re more interested in the key actor.

b&a: And, David, I was curious, in a big visual effects film, the director gets to review shots and make notes. I’m wondering what were your notes that you would end up giving for these sorts of shots?

David France: It’s interesting, Ryan talks about the pace that we were doing this at. So he was rendering overnight and sending in the mornings—often we were just reviewing together. I don’t think that I gave him specific notes along the way. I mean, we were very concerned about when heads turned entirely away from camera and when the effect leaves the face and what that looks like. There was a lot of dialogue about that, because it would pop on and off at points. And then how do you smooth that out and make that a little less jarring? But our major conversation and dialogue was about adding the blurs, emphasizing the blurs, which was one of the things that I felt was going to be important at the beginning of the film to teach people about what we’re doing, to actually make the VFX a character in the film.

And then for that character to then dissolve as it does in the press conference scene with Grisha, that was probably the most work that he and I did together on notes. Although the design, the idea of how to do it, was his purely. And I opened up a file he sent me and I was just blown away by his decision-making in that moment of reveal. So then the question for us was where do we put that reveal? And that was something that we had a lot of conversations about as a team collaboratively.

b&a: I love that reveal, Ryan. Do you want to mention anything about technically how hard that was to do?

Ryan Laney: It wasn’t terribly hard technically compared to the other things that we were doing. David said he wanted something magical. And so, with visual effects, when you’re doing something like a transition, there’s always a balance between too much, not enough. So we peppered in, there’s 20 or 25 layers in there of different colours and reveals going on to try to open up into from Grisha into Maxim, through the transition.

Also, David mentioned the first 20 shots a couple of times, I think it’s worth mentioning from an editorial perspective that Max Anderson actually removed some dialogue from that first 20 shot section, that first several minutes of film, so that the audience wasn’t reading subtitles and that they were looking at the faces so they can, this idea of training the audience, acclimating them to what was going on.