Weta Digital has made photoreal CG humans before (Furious 7, Iron Man 3, Guardians of the Galaxy Vol. 2), and it’s made photoreal CG characters based on actor performances before, too (Avatar, Apes films, Alita: Battle Angel, Avengers: Infinity War and Endgame).
The studio constantly refines its expertise in performance; all the way from the original motion capture, through to motion editing, keyframe animation and into the many, many details that make up these digital humans and characters.
For Ang Lee’s Gemini Man, where Weta Digital created a completely digital 23 year-old Will Smith who ‘acts’ alongside the real 50 year old Will Smith, I wanted to ask the studio what new things in particular were done to achieve this, and what new challenges they faced, given this was a high-frame rate film shot in native stereo.
Weta Digital visual effects supervisor Guy Williams, working under production VFX supervisor Bill Westenhofer, led a team that included co-visual effects supervisor Sheldon Stopsack and animation supervisor Paul Story, that continued to advance the studio’s character pipeline. Here, Williams runs down a hit list of 10 new digital human advancements, all the way from what proof of concepts were done, to the shooting methodology – including the idea of ‘messy fighting – and then individual CG approaches such as a new procedural pore system and the use of Deep Shapes.
#1 – the A/B approach to shooting
Guy Williams: For filming, we landed on what we called the A/B approach, because Will had to be in the frame twice. First we shot Will as the character Henry in costume and makeup opposite an acting double who was playing the part of Junior (the younger clone of Henry). We shot clean plates so we could paint the acting double out.
At the end of the live action shoot, we set up a mocap stage and reversed the roles so that Will would be playing the Junior character wearing mocap gear and the acting double was playing the Henry character. We’d pick a favorite take, sync up all the audio and record it all again with Will for a second performance. One of the new things here was witness cameras – we had to get some RED RAVENs just to guarantee that we’d be able to go 120 fps true, that we could gen-lock (since the film is shot 120 fps).
One thing that we realized later on was that if Will wasn’t in frame with himself, we could just capture him in-situ. So we would just mark up his face and put the helmet rig on him and shoot the shot live action. And then ‘all’ we had to do was paint out his head and replace his head with a younger person of himself.
#2 – The Pepsi challenge
As part of our process of making sure we were on the right track, we did what we called the ‘Pepsi challenge’ with footage, but first we used photographs. We picked four photographs from various movies that we thought were iconic Will Smith. And then the goal was to line up our digital Will Smith to those photographs, light it as accurately as we could and put them side by side and see if you could tell which photograph is which.
We’d constantly shuffle the order so you couldn’t tell which one was the digital one. We eventually got to a point where it got hard to tell. That was probably the Will Smith 1.0. The one you see in the movie is probably Will Smith 35.0.
Then came the Pepsi challenge. We took a scene from Bad Boys where Will Smith and Martin Lawrence are in the car together talking and there’s a lot of back and forth. There’s 30 some odd shots in the scene. What we did was we took two of the shots of Will Smith and painted him out and replaced them with a digital Smith. Same performance, same audio, same everything. And the goal was to get to the point where when you watch the one minute long cut and then turn to the audience and say, tell me which two shots we replaced, that it became very hard to tell.
These are the kinds of things that you have to do to just have the confidence internally that you’re doing the right thing, that you’re on the right track. You can’t just build a Will Smith on a gray turntable and say, yes, I believe that that is Will Smith. You constantly want to find the most damning tests that you can do, put it through that test and if it fails, then you need to go back through the wash.
#3 – A procedural pore system
The major thing that we did that we hadn’t done before was a procedural pore system. Typically what you would do for skin detail, is you’d take a live cast of the actor, create a plaster mold for that, then do small latex sheets out of the mold, and then put those on a flatbed scanner and then do a lot of careful registration to get those pores to line back up to get those pores to align back up to the texture of the face. You get really accurate pores that way. It’s just that the shape of the pores isn’t perfect because you’re taking it from a flat mold, not an actual three-dimensional object, so you’re losing the detail at the scanning point.
What we did this time was, we actually came up with a system for growing the pores on the face. It used flow maps, because pores obviously have a flow and structure to them. From there you can control things like anisotropy, depth, width and height. And we just kept on iterating on small areas until we got to look exactly like what the actor looked like. Now, obviously every pore isn’t in exactly the correct space, but in any given area it was the right shape and the right size. The added benefit of this system is that you get a perfect shape of the pore, the right round shapes.
Then we could take that pore system and once the face was animated, we could run a tetrahedral solve on top of that, allowing us to get, not just wrinkles, but wrinkles forming along pore lines, and also the pores collapsing in the correct ways. Pores are sort-of shaped like diamonds – they can collapse in multiple ways but they have a dominant way of collapsing.
All that came along with the simulation process – we scripted that out so that every shot that had animation, we could nominate shots to simulate and they could run overnight.
#4 – Defining the skin in terms of melanin
We define the skin in terms of melanin. Specifically eumelanin, and pheomelanin, which are the two sub-types of melanin that define colors. The important thing about that is, instead of just painting a simple color map for the face, we actually end up with something that, depending on which angle you look at it, it looks correct. So light enters the skin, interacts with the correct melanin, and gives you the correct spectral responses that come back out and gives you a visual result, as opposed to just bouncing off the color map. The advantage of that was the skin can be very view dependent. You can see different colors depending on what angles you look at the skin. Also when you do tricks like blood flow, which we did, when you scrunch up your face, you squeeze blood out of certain areas. This was handled in our spectral renderer Manuka.
#5 – Deep shapes
One of our animators came up with this system called Deep Shapes (which was also used on Endgame). It basically treats the blend shapes of the face as different depths of the skin, so that over time, as the muscle starts to move the skin follows after. So there’s this delay effect that gives you this beautiful naturalistic move. For example, when your eyes blink, instead of the lid coming down and going right back up, the lid comes down but as it starts to go up the eyelid, the eye bag is still traveling down ever so slightly – it just gives you this much more naturalistic skin solution.
#6 – Teeth and eye modeling
We modeled the teeth correctly this time. We actually modeled them as two volumes. The dentine and the enamel were modeled separately. So they actually get a proper transition from yellow to blue in the teeth.
For the eyes, we modeled the conjunctiva, to make sure the corner of the eyes go that little bit yellow that they’re supposed to go. It’s not just mapped onto the eyeball. So, we were able to also do things such as when the the eye looks left and the iris stays in the corner just that little bit longer to look more realistic.
#7 – Fighting ‘messy’
Ang had this idea of what he called ‘messy fighting’, which is, when two stunt people fight, there’s sort of a timed choreography to it, which sometimes makes it look like they are anticipating the next beat. Stunt performers have done amazing work over the years of trying to hide that choreography so it feels like people fighting. The thing is, in stereo at 120 fps, it becomes obvious that it’s a dance and that it’s not really fighting. So Ang wanted to take it this next level, what he called messy fighting, which is break the choreography up, mess with the timing, mess with the freneticism and make sure the contact’s actually contact.
For example, we had to make sure the head doesn’t start moving until after the fist is half-way into it, pushing it sideways. That required a lot of effort, because it was pretty quick to get to a digital fight, but then we just kept on changing the timing and kept on changing the speeds and trying to make it look less and less like two stunt performers and more and more like just two talented fighters beating the snot out of each other. We had to do simulations on the face so that when it punched in right really hard, you actually saw the cheeks and the ears wobble around.
#8 – 4K, 120 fps, native stereo (oh my)
Gemini Man is 4K stereo at 120 fps and native stereo, so you end up with 40 times the data (5 times as many frames, 4 times as many pixels per frame and 2 eyes). What that means is if you put a Quicktime in our internal view system, it takes 40 times as long to create the Quicktime. It could take upwards of six hours just to drop the clips to the system – one 2-minute long shot took that long to put in our system.
So, you’d seen it in the morning, give notes, you’d see it in the afternoon, you give notes, knowing that the version you were going to see next, you either were going to send or not send- you couldn’t get a revision on it. It would just take too long to see again. There was a significant burden on the day to day work of the artists. The amount of data being pushed back and forth was astronomical.
It didn’t really change the mocap at all because mocap already captures at 120 fps. That was sort of like, everyone’s like, ‘Yay!. We don’t need to capture higher than 120 because the face really can’t move that fast. It’s not like there’s information that we’re losing beyond 120.
Some processes are painfully linear, like roto – these amazing paint and roto artists, they work per-pixel. So, if you increase the data set by 40 times, it just means there work becomes really long. That took a lot of scheduling. Kudos to the production team for juggling those 10,000 balls all at the same time to make sure that everything came together in the end at the right time.
And then the challenge of native stereo is that it’s not theoretically perfect. In other words, you can’t say, ‘We know that the eyes are exactly this far apart, just track one camera and we’re done.’ You actually have to track both cameras as if they’re unique cameras because they do have slight imperfections and inconsistencies. And it’s one thing to say that you don’t mind those imperfections, but it manifests itself in one eye that the character’s not sitting on the bike, he’s sitting one inch to the right of the bike. So you have to make sure that you get all that right.
#9 – Replace the body in CG, or not?
Whether we went completely CG with Junior was all about figuring out what mattered. And what I mean by that is that when you look at a shot, if it’s performance driven, then the body has to be Will Smith’s body. And if Will Smith’s already in the frame, it can’t be Will Smith’s body, it has to be a fully digital body from the mocap. So it’s still Will’s performance.
The thing we kept on saying is that even when an actor is not talking, they’re still acting. At an empirical level, if you just do a head replacement on another actor’s body, it starts to feel like a bobblehead. It feels like it’s not connected, like it doesn’t match. One of the reasons is, when you talk you tend to gesture. Your body is doing very subtle things that reinforce what the head is doing. So if you replace the head and not the body, you create disconnects.
So in those situations we always replaced the body, full, top to bottom, CG clothes, CG outfits, everything. The only time we got away without doing that, there were two special cases.
In certain situations when Will Smith was not performing against another Will Smith, then Will was able to get made up for the Junior role. We’d put clothes on him because he looks like Junior, and then we can replace his head because the body was the performance we wanted. That would save us some time in mocap later.
The other situation is when it was a stunt, say riding a bike at 50 miles an hour as he goes across the wall that’s only one foot wide. The performance that you want is the performance that the stunt person is giving you. Will Smith isn’t going to be able to give you something better because he can’t ride a bike 50 miles an hour on a one foot wide wall. So in those situations, we kept the performer’s body, for the most part. In some situations – this is sort of the rule of thumb, but then it was always broken – the exception in this case was that Junior’s stunt rider was significantly smaller than Junior. So we replaced them from the waist up. Now if you look really closely, you’ll notice that his legs are about two inches too short. But it’s hard to tell because his legs are folded on the bike.
#10 – Determining (and discussing) likeness
We found that ‘likeness’ was a razor’s edge. It was really quick to get a Will Smith that looked interesting and looked like the right Will Smith. That wasn’t hard to do. The second he started animating or the second the camera angle changed, your mind instantly finds a new anchor to look for. So what we found was that we had to just constantly, constantly, constantly work on the system so that every time we found the new situation that didn’t hold up it got elevated.
To some degree we do that with all animation. That’s the process of making sure that the character comes through. But this was like 10 times harder. On top of getting the animation right, you had to go through and make sure any one of those animation shapes that are so critical to getting performance don’t undermine likeness.
In reviews, people would sometimes say, ‘It feels like Will’s cousin. It feels like a real person, it kinda looks like Will, but not exactly like him.’ Those become just incredibly esoteric questions to answer. What was surprising was that you could go off-model but he would still be believable as a breathing character. We didn’t expect likeness and believability to be so uncoupled. We thought that they would be heavily coupled and once we got believability, then if we got likeness on one frame then it would follow through. That turned out not to be the case at all.
If you imagine that a facial system is made up of hundreds of shapes and that those hundreds of shapes can create thousands, if not tens of thousands, of in-between shapes. If any one of those in-between shapes is wrong, or even slightly wrong then you don’t recognize it as Will Smith anymore. One of the animators described it as just constantly surfing on a razor’s edge, trying not to fall off one side or the other towards unlikeness. Until the last day we were still just chasing it.
Guy Williams will be speaking on Gemini Man at VIEW Conference, and befores & afters will also be there. Williams will also be part of a special digital humans and de-ageing panel that Ian Failes is moderating at VIEW.
Get bonus and early VFX content via a befores & afters Patreon membership