How ILM approached the youthification effects in ‘The Irishman’, with a brand new technique.
How do you de-age several notable actors for lengthy periods of a Martin Scorsese film – to a number of different ages – and establish a shooting methodology that captures every nuanced performance and has a minimal footprint on set?
ILM had to answer that challenge on Netflix’s The Irishman to help deliver de-aged performances from the likes of Robert De Niro, Joe Pesci and Al Pacino. The markerless solution would ultimately include a combination of a special multi-camera set-up on set using infrared witness cameras sync’d to the main camera to extract the actor’s geometry, textures and lighting, a procedural approach to the de-aging itself, and some machine learning techniques.
To learn about the process, befores & afters sat down for a lengthy chat with ILM visual effects supervisor Pablo Helman. Here, he relates how the studio re-imagined the de-aging process, the test carried out and the art and technology behind ILM’s younger versions of the actors.
‘That sounds like another movie that I want to make’
Pablo Helman: It’s 2015, and I was doing Silence, working with Marty in Taiwan. We were having Thanksgiving dinner and we started talking about movies. Obviously, Marty loves talking about film. I thought that he was doing ‘Sinatra’ at the time, so I pitched him the idea of making a younger Sinatra and that maybe we could help them with that. And he said, ‘Hmm, that’s interesting.’ He’s a very curious director. He’s all for exploring. We started talking about how we would go about that. And he said, ‘Well, you know, that sounds like another movie that I want to make called ‘The Irishman’. He started talking about the film and sent me the script overnight, which I read.
In the morning when we were about to shoot, I said, ‘I’m in! I read the script. It’s incredible. We want to do this.’ We talked about the fact that it was going to be De Niro and that Marty and De Niro wouldn’t want to be intervened, in the way that visual effects can do. Because we in visual effects like to have control over everything, meaning that we like to separate everything. That’s why bluescreen and motion control came around. If you actually follow the history of visual effects, you’ll find that everything is about having control and separating things so that you can change things around.
Back in 2015, everybody was using markers and helmet-cams and we were all trying to engineer a way to get the most amount of data from the faces and control the lighting. It was a time when those LED cameras in front of faces were sometimes a problem because they would flare – there were all kinds of problems we were having. It was very clear to me that that wasn’t going to be possible on The Irishman and we would have to come up with something different.
Marty said, ‘No markers, no helmet cameras, no ping pong balls, no tennis balls, no nothing. The actors are going to be on set, looking at each other, improvising. There’s going to be long takes. Take the technology away from me, please.’
This was a dream project. At the time it was only about Bob De Niro, but it was obviously going to involve Joe Pesci and Al Pacino, so it was a great opportunity for us to develop a performance. And that’s something that’s very difficult to come around in our neck of the woods. There was basically no action, per se. It was just basically a bunch of people talking and Marty knew that the cameras were going to be very close up.
I proposed a test. I met with the producers and we talked about a test. We talked about bringing in Bob De Niro in to re-create a scene from Goodfellas. The reason why we wanted to do that was because it gives you an anchor, something that we can anchor our work to. If we did a scene that all we know from Goodfellas, we could always go back there and compare it. I picked the pink Cadillac scene because it had to be a minute of dialogue and be close-up work. It had to also have expressions that were the most and the least expressions, in a broad range.
They re-created the whole set for the bar, and then in comes Robert De Niro and he’s 74 years old and he does that scene. There were no markers. We shot it on film, and we put three cameras in there because we thought, okay, well, if we’re not going to have markers, what do we have that is going to give us the performance? How can we create geometry from not having markers?
So, if you don’t have markers, what do you have? You do have a mass of geometry, which is the actor himself in 3D space. You have lighting and you have textures. If we can figure out a way to get as much of that person that is in front of the camera as possible, then we’re going to get the performance captured.
We rolled out three cameras, a central camera, which was film, and then two witness cameras, which were RGB cameras, right next to the director’s camera. We recorded the performance and then we brought all the material to ILM and in about eight to 10 weeks we came up with about eight seconds of a performance that we were really happy with, where we could say, okay, well look, we took geometry out of nothing – this is crazy! We rendered it and showed it to Marty and Bob De Niro, and they greenlit the movie out of that test.
The beauty of infrared light
From the test, we knew that this technique was going to work, that we were going to be able to get geometry out of it. We had to come up with a way to attach those witness cameras to the center camera – now digital – so that they were not occluded and they were always taken care of. Since the system works on lighting and texture, whenever the lighting is occluded, that’s a problem.
The way to do this was to create a controlled environment that was invisible to the actors and wouldn’t change the lighting on set. To do that, we started thinking about infrared light, and what would happen if the two cameras next to the center camera were infrared. Instead of bringing the actors to the controlled environment, we brought the controlled environment to the actors without anybody seeing it.
We worked with ARRI in Los Angeles and DP Rodrigo Prieto to modify some camera hardware and software so that those two cameras were infrared (the solution was called Flux). We came up with a ring in front of the lenses of the infrared camera so that the infrared light wouldn’t be brought onto the actors. The actors were illuminated with red light, but it wouldn’t change the set – it was in a different spectrum.
Then we started working on the software. The software basically took those three cameras and with all the information – the infrared information and the RGB information of the center camera – it would calculate lighting and textures and would basically take a mesh and deform it on a frame-by-frame basis and change the geometry so that it would mirror the capture of the performance.
I had worked with Marty and I knew how detailed and how particular he was about performance. For him, it’s about their eyes, it’s about what the actors do with their faces. It’s close-up work, from forehead to chin. I knew that maybe I was going to be able to do a test job for him and he was going to be happy with it, but I knew that we had to come up with a production model that was going to financially and aesthetically be a non-compromising system.
There is no compromise when you’re working with him because he’s completely focused on the art and he’s completely focused on what he feels about the performance. So if I was going to start with a system that was not scalable and that was only going to be able to get me some of the way, then I would have been dead in the water eventually.
The challenge of bodies
This was another thing, with the bodies. One of the things that we talked about was, well maybe what we do is we do all the performances that you want with the actors, and then we could put in a body double who is more to what you want in terms of the right age.
But that became a complete impossibility. For one thing, actors improvise. Now, we did convince Marty to shoot digital, because that’s really the only way the three cameras could be in sync. Mind you, a quarter of the movie was shot on film, so we had to match that look. Once we decided to shoot film, all the takes were open. We could do 20 minute takes. You see, what the actors do when they get to the bottom of the page after a two minute scene is they start over. They don’t cut. Or they improvise in the middle of the take and they just keep going and going and going. There was no way to put a body double in the middle of it because Marty wasn’t going to make a decision on the take.
So then we started thinking about 3D techniques to change some of their body. We knew we had to change the chin and the neck, which brings us back into the collar, which we have to render in 3D, as well as the shoulders. We also brought in an outside consultant to help us with the body position with the actors. So this person would work with them every day throughout the day to get the bodies to work the way they should.
The multi-camera set-up
The other thing that I found out working on Silence with Marty is that he likes complementary lensing. For instance, for conversation scenes when the actors are talking to one another, you have two cameras, one on one actor and the other on the other actor. Why does he do that? Because he loves improvisation and he’s going for not only the performance that he’s getting, but also the reactions, which means he doesn’t need to re-shoot things.
There’s another reason why it was really important for the actors to be on set: looking at each other. We’ve learned from research that the body does what the eyes do, wherever the eyes are going, the body goes with it. Whatever we do, either gestures or attitude or the way we sit, the way we act, is always through the eyes. So the eyes are not only tracking the person in front of you, but they’re also looking around at the environment and they’re adjusting.
Because it’s all about communication, you’re always looking at whether the other person is understanding what you’re saying or not. And then you’re adjusting. So for all these reasons, we couldn’t go into another environment. All the performances were done on set, there were no later takes in a controlled environment. There were no re-shoots. Everything that was done was done on set, and when we finished the 108 days, that was it.
How it all works
The whole thing’s based on lighting and textures. When you have two set-ups, that means you have six cameras. Remember the two cameras that are witness cameras are also projecting light – infrared light, but it’s still projecting light. That means that the lighting has to be calculated for the six cameras. If you have a camera behind you that is looking at the other actor, you’re still getting light on your ear and on the side of your face because it’s coming from behind you. So all that stuff had to be manipulated.
The software system takes a look at all the lighting. The first thing you do is you do the layout and the matchimation of the faces from all the cameras. We’re changing, in 3D, all the faces all the way to their shoulder. Then there’s a calculation of where the lights are. We had to Lidar every one of the set-ups. It’s not just Lidar’ing once or twice a day, or every set. Every time there’s a change in lighting we have to do a Lidar, because the Lidar gives us the geometry as to where the lights are. Then you take the Lidar and that tells you where the instruments are, and then that tells you where the instruments are, and then you take the HDRI information which tells you about the intensity of the light and the color of the light.
Then you combine that with the Lidar and then all of a sudden you have a set that has the right intensity of the lights and the right position of the lights. So you put all that information in there, plus some other things that basically tells the system to only concentrate on the head. Then you press a button and through the three, or six, cameras, it computes all these on a frame by frame basis. It takes a look at that and compares it to a model that we have created of that contemporary actor – the 76 year old actor, say, for Bob De Niro. And then, if there’s a change, it deforms the geometry to conform to that performance. And then after you do that, you go to the next frame and you can go to the next frame and the next one.
So, that gives you the performance of the contemporary actor – the 76 year old. But after that, you take that performance, and you need to re-target it onto, for this specific project, a variation of models that have to go from ages 24 to 80-something.
If the first part of the capture is a math problem, the second part is not a math problem. It’s a design challenge, because that’s the thing about Marty, he designed those characters to be not necessarily the same guys that we knew 30 years ago. When you see De Niro in the movie as 83 and you rewind 30 years, you don’t want to see Jimmy Conway from Goodfellas.You want to see Frank Sheeran, younger. That’s a tough thing to do because it’s not a math thing, it’s a design challenge.
The same thing with Pesci. I mean, at 53, he wasn’t as thin as he is in the movie. But Marty wanted those characters to be who they are. It’s not a re-creation of a specific time in the actor’s, say, Robert De Niro’s, life. It’s a new creation of what that character, Frank Sheeran, looks like when he was 30 or 40.
About four or five years ago when we started, I started talking to a make-up effects artist called Bill Corso. We started talking about it and he was saying, ‘We have a problem with make-up in that sometimes the actors don’t look like themselves on set.’ I said, ‘Wait a minute! We have the same problem. You know, sometimes we put the actor in there – the asset – and they don’t look like themselves.’
I said, ‘How do you solve that?’ And he said, ‘Well, we just go in there and paint, basically paint shadows or whatever we need to do.’ So that’s also been our approach. According to the shot, we would do whatever we needed to do to get it done.
But it is an interesting thing because it’s really a fallacy to think that because we’re going to have an asset, it’s always going to look good. In 108 days, a person changes. Sometimes they look great and sometimes they had a bad night or they show up on set a little bloated. One time, I show up on set and I look at Bob and he calls me over and he starts talking to me, but I can’t understand what he’s saying because he had a wisdom tooth out the night before, but he didn’t tell anybody! His face was bloated but he said, ‘Well, I’m not talking today – there’s no dialogue, so I think it’s OK…’.
In terms of consistency, we also have an A.I. solution that we came up with called Face Finder. For about two years, Leandro Estebecorena, our associate visual effects supervisor, put together a library of targeted ages for the actor. He went into research mode and had thousands of frames and hundreds of sequences from all these different movies from when the actors were about 30 to 55. We had those libraries catalogued in different ways. You could search all kinds of ways, via eyes or crying or sweat or big notes or close-up work, for the three different actors.
Then what we did is we would render a frame for the shot that we were working on and we let the A.I. program run through our library and come up with pictures of something that was similar to what we rendered. And you could look for the same lighting and same angles, or rotations if the actor was moving and the camera was running. Then we would render them and that would be a sanity check. In terms of consistency, it helped us because that’s an incredible way to check your work.
It’s not only all the eyes it goes through – the director and other filmmakers – but it’s also the computer saying, ‘Hey, it matches this and matches there, you’re OK here, but you have a little problem right there.’ I think that in terms of A.I., it’s going to get better and better. A.I.’s going to be a big part of how we re-target an asset – how we go from an old person to a younger person or what we call the ‘behavior likeness’.
Digital doubles of the actors
If you decide to use the markerless software, then what do you capture? The camera solution is what we capture. Then there’s the software solution where we say, what do we do with the information that we get? There’s all these parallel things that happened while we were shooting, including making the digital doubles. It starts with the digital double that is the contemporary actor, and then varies from the contemporary actor to the ages in the movie. To do that, we used the Disney Research Zurich Medusa system for process and performance.
There’s a part of the Medusa system that requires a two minute performance for every actor, and then we ask for hundreds of poses from every actor that spans the limits of how subdued and how over the top their performance is. Then we also used the Light Stage to obtain all the skin and texture and pore detail to create the assets.
Could ILM / Disney Research Zurich’s Anyma have been used?
The reason why we couldn’t use Anyma is because it needs a controlled environment. You need the light to be soft and we can’t control that. I can’t go to the DP and say, ‘Don’t light this way and don’t light the other way.’
Seeing that we were using infrared cameras, I would ask the DP to use, as much as possible, LED sources as opposed to incandescents, which have a larger infrared component. And he said, ‘Well, this movie is going to be what it is, there’s a lot of practicals here.’ So we worked with the DP to get filters on the incandescent sources.
Anyma does not deal with the lighting, for one thing, so I could not use it. Number two, Anyma doesn’t really allow for multiple performers to be in the volume. When you’ve got occluding happening, it’s very difficult to get anything out of it. Anyma also doesn’t allow me to get the textures from that performance and then use that for decoding the performers.
The only way that we could have done this is through a markerless on-set performance with no restrictions spatially. That’s the other thing about Anyma, you have whatever amount of cameras you want – you can have three or six or whatever you have – but you have a specific volume. I can’t restrict these actors. They go wherever the camera is going, wherever the director wants them to go. So Anyma is a very restrictive environment that works well for other things but not for this specific project.
I could have also said, if the actors had a different attitude, ‘Well you need to wear helmets and that’s the end of it.’ But that wasn’t the case. It was a completely different way of thinking about it. It’s a brand new idea and this is the only project that’s been around that allows for no markers, no helmets, no restrictions, on-set, with on-set lighting.
Could performances be ‘re-imagined’, if necessary?
I think you could change the performances, but if you did on this project, you’d get fired! We did not change a blink. There was no key frame animation in the project. I love animation and I think it can be great for certain things, but for this specific project, there is no animation. There is a procedural solve that the computer does through this system we created, and then it gets re-targeted as a performance of a younger version. But there was no way to change anything, and we didn’t change a thing.
There was an effort to create a face rig out of whatever the solve was giving you, which is a bunch of curves. And then you could organize that in all kinds of ways so potentially an animator could come in and start moving things around. But this is the thing: there’s a lot of stuff that happens genetically. There’s all kinds of genetic codes that are connected that happen in your face and makes you who you are. It also makes other people recognize who you are. It’s very difficult to get an animator to learn those codes when they’re not known, to make those connections.
If you’re going to make a slider to move something, you need to know what to connect. And if you don’t have that information, you can’t do it. So for us to be in the middle of what biology is doing in your face is basically taking it out of the behavior of your likeness. Also there’s some scientific problems with it that has to do with compression of surfaces. For instance, the lips, when they touch, there’s compression there, but it’s also a displacement of volume. And that’s not sim-able – well, you can sim it, but it’s not accurate.
What you end up with, to solve those problems, in an animation sense, is a key-frameable shape library shape that basically does what it should do physically. And that’s never correct. So what happens is that it affects the phonemes and the dialogue. Everything that you’re doing, especially in your mouth, is connected to your box, in terms of what it sounds like and it reverberates throughout your face. So you can do all the keyframing that you want, but it’s not connected to whatever you’re doing.
That’s why you see sometimes that the dialogue doesn’t hit the right notes, because it’s not reverberating in the face. And that’s why it was so important for us not to use markers. The solution that we came up with is a deformation of the mesh. When the capture is done, all the reverberation is right there and is connected to the dialogue.
Observations on aging and de-aging
I had to really understand how everything goes through Marty’s eye. When you look at something and basically you squint your eyes and you look at somebody’s face – he basically sees outlines. The outlines that are given by eyebrows, the wrinkles and your eyelids, your nose and mouth. There are all kinds of winkles there that basically draw the face. If you take those away because you’re younger, it changes.
It’s the same as if you live in an apartment for 50 years and you have the same furniture. And then one day I take all the furniture out and you go into the apartment and you say, wait, this is not my apartment. Well, it is. It is, I just took all the furniture out of it. It’s the same thing.
It was about getting Marty to understand, what is it that being younger is, and then ‘negotiating’. Maybe we should put more wrinkles here or more wrinkles there, but if I do that, then he’s going to look older. Well, it’s okay if he looks older around the eyes because that’s where I want to see the expression of this specific thing that I’m looking for.
Final shot count…and CG windshields
We did 1,750 shots. There were only 40 omits, from 1,750 shots. This has never happened to me! Usually what happens in visual effects – because usually we are working on the third act and it’s the most flexible or fluid, let’s say, because of the type of shots, and usually it ends up being between 10%, 20% of the work gets omitted or changed. This movie has been three and a half hours for a year. It has not changed.
I mean, it’s really refreshing because, for all of Marty’s non-technical side, he understood that if he changed the take, we had to recompute it. When we started this, I wondered whether he was going to be able to comply with this. And he did. In fact, he turned over 50 shots while we were shooting just so that we could do the research, especially for things that were difficult, like occlusion or someone on the phone and the camera’s moving around – here we’d need to render what’s behind the phone – how do you do that? Well, we have three cameras, so the central camera’s seeing what the director’s seeing, but the left and right cameras are seeing exactly what I need to render under that.
Smoke was difficult. Glass was difficult. Because we were working with infrared, and all the cars are from the 50s and 60s, there’s a lot of lead on the windshields. If camera is outside looking in outside the windshield and the actor is behind the windshield, the infrared light doesn’t go through and I can’t see anything! He looks black. So we had to take all the windshields out of those. All those windshields that you see in the movie, they’re all CG.
Note: SSVFX, Vitality Visual Effects, Yannix, Stereo D, Bot VFX and Distilled VFX also completed visual effects for The Irishman.