The AI, volumetric and animation tools that helped make Pixar’s ‘Elemental’ possible

Neural style transfer and enhancements to Pixar’s curvenet animation tools were part of the toolset on the film.

In Peter Sohn’s Elemental, the two central characters, Ember and Wade, are literally made of fire and water. That presented an immediate challenge to Pixar in terms of both simulating those kinds of effects, while also ensuring the look and feel of the characters remained illustrative and emotive, as desired.

To make it possible, the animation studio devised new ways to realize volumetric elements and lent on machine learning research in the field of neural style transfer to retain a stylized look to the volumes.

In addition, animation curvenet tools already developed at Pixar were enhanced, and a significant investment in GPU rendering also took place.

Pixar visual effects supervisor Sanjay Bakshi tells befores & afters about the big leaps in tech for the film.

b&a: What were the first kind of visual effects considerations you had when you came onboard this film?

Sanjay Bakshi: At Pixar, a movie will have a screening in board form. We invite a lot of people to see it, just to get feedback. I went to the first screening and a lot of the technical folks all felt like this was a very ambitious project, a really challenging project, but also a really exciting project. In the boards from day one, there were visual effects in every shot of the movie. Fire and water. And, a lot of the humor and storytelling was conveyed through the visual effects. Pete really wanted to exploit the elemental nature of the characters, so there’s a lot of storytelling that happens through the fire and the water. The behavior of the fire and water convey a lot of the storytelling.

b&a: One of the things that I really noticed was that you could really see the volumetric side of things like the fire elements. Where did you have to take the artistry and the tech this time around to just sell those volumes?

Sanjay Bakshi: A lot of Pixar films have explosions in them, or have fire in them, but we’ve never done a film where the main character is a volumetric effect. The scalability of the pipeline to allow that was one consideration. The other aspect was that the fire had to perform. Ember is a volumetric character who has to convey emotion. You have to connect with her, and so, how do you make a fire simulation be appealing and expressive? That was a big technical hurdle.

Water was the same way. It’s a volumetric stimulation. We had to get its behavior to feel like the element, to feel like water, but also be appealing and for the audience to connect.

Character “sprints” developed by the VFX and Animation teams at Pixar based on feedback by Director Peter Sohn. This series of “sprints” resulted in the final version of Wade Ripple, one of Elemental’s main characters. Credit: Sanjay Bakshi.

b&a: How did you tackle it this time around compared to previously?

Sanjay Bakshi: There wasn’t a clear path. That was the exciting thing for the technical crew, to get that opportunity to try something and do something new. A lot of Pixar movies break ground, but we haven’t had that kind of challenge for a while. That was really exciting to the technical crew.

A lot of times, our art department will do a lot of work upfront, drawing loosely, and working with the director, and saying, ‘Is this what you’re thinking?’ You can’t really do that with fire and water. It’s such a dynamic thing that you have to do it in the computer. You have to run the simulation. You have to learn from the iterations that you do.

So, we assembled a team of animators, effects artists and shading artists. We worked together and iterated as fast as we could to get ideas out there. We started with very naturalistic fire. We learned something from that. We then said, ‘Let’s do the most naturalistic thing and put some cartoony eyeballs on it and do this as fast as we can so that we can have something to talk about.’ We’d ask, ‘What do we like about this? What do we not like about this?’ Then we’d try another experiment.

We’d go between very realistic and naturalistic to very cartoony illustrated, and try to find that balance in between the two. Each one, we’re learning from and then discussing the things that we like and don’t like about it.

b&a: I think you did such a great job in the end of what I would say is a more stylized look and feel. I know that when I’ve covered previous projects at Pixar, one of the challenges is if you’re doing simulations with your tools, and in Houdini, say, the first pass is often a very photorealistic feel. But what were the ways that you got it to the right stylized look and feel here?

Sanjay Bakshi: That’s a really good way to put it. I think for fire, it was different than for water, but essentially, that thought process was, we wanted the dynamics to feel realistic. It had to be a simulation. It couldn’t be something that we’d loop. In video games, a lot of times, they’ll do a loop of a few keys of a fluid, but right away, when we tried that, we realized that didn’t feel dynamic, it didn’t feel like fire. It had to be a real simulation but then we would characterize it.

You’ll notice that on Ember, she has a volumetric kind of silhouette line that defines her, so then she can be against another fire character and you could separate her out. There’s this boundary of her that is dynamic but it’s really about carving her out. For water, we did the same thing. We called it a meniscus.

With Wade, there’s a water simulation that’s happening on the head, but it’s always trying to get into the same shape, those distinctive three fingers that you see on Wade’s head. It’s a simulation running, but if he’s static, or stays still for a second, those things will form. Then, again, talking about Wade, there’s caustics that happen in him that are volumetric and not on the surface. There’s bubbles inside of him that are conveying a motion through him.

There’s all of these elements and we’re dialing in the presence of them throughout his body. On his face, we’d make sure that there weren’t too many bubbles under his eyes because that looks really weird, but we can be more liberal with them on his arms and so forth. We were balancing all of those things that make water feel like water and getting the right proportions.

Character “sprints” developed by the VFX and Animation teams at Pixar, resulting in the final version of Ember Lumen, one of Elemental’s main characters. Credit: Sanjay Bakshi.

b&a: I saw at SIGGRAPH that Pixar will be talking about ‘Volumetric Neural Style Transfer’ in terms of the fire work. What did that involve?

Sanjay Bakshi: This was a really exciting discovery for us. There is a technique called Neural Style Transfer that’s been in the literature. It got a lot of attention a few years ago. Say you have a fluid simulation that comes out of Houdini. The good things about it are it’s very realistic. There’s no loops, and you can’t see any patterns. It’s very dynamic. It’s very fire-like. There’s tear offs that come off of it. The frequency of it is very accurate to how fire behaves but it’s not very illustrated and it’s not very organized. That’s what volumetric Neural Style Transfer does for us.

We gave it some images that we painted and we asked the volumetric simulation to move towards these images. We ran the Neural Style Transfer and then we mixed between the results. It gives you the dynamism of the full volumetric simulation but it’s more organized and it’s more illustrative. This is the technique we used to really get Ember to feel like fire but also to be a little bit more like an illustration. Her flames are not as realistic and not as messy as a real fire would behave.

b&a: Having talked to a few people in the Disney technology ecosystem, I particularly like the way Disney approaches AI, in perhaps a more ethical way. I’m guessing the training for this was pretty much based on existing Pixar images or Pixar films or Disney ecosystem ones, is that correct?

Sanjay Bakshi: Yes, we felt comfortable using this volumetric style transfer because the target images are hand-painted by our artists. We have also used machine learning for denoising for a number of years. Those are all trained on a noisy rendering of one of our movies and a not noisy version.

We don’t want to use anyone else’s IP and we try to be really careful about it. Our artists put their personal work out into the world and it really bothers them when you see some of these techniques that are available to use where it’s possible the artwork has been used for training without permission, so there’s a lot of sensitivity within Pixar for sure.

Pixar’s NST transformation utilizes a raw Pyro fire simulation for fire density and movement, and applies Pixar artists’ painted texture as its target style. It infuses the essence of the painting into the dynamic fire simulation to create a specific, targeted, and stylized fire density field. Pixar combined this effect with many other traditional techniques to achieve the final look of the fire characters. Credit: Junyi Ling.

b&a: That aspect of creating the characters was a really nice new workflow. Was there another new tool or new process that you felt was something different on this show?

Sanjay Bakshi: I think the animation tools that we built for this to make fire animate were great. Fire is pretty novel and very specific to our film. We have used curvenets previously and published about them. It lets us use a really low resolution grid that we can put on the character and have really high level controls that are exposed to the animators to really shape things. So we’ve used that on other films but really pushed it on this one.

We wanted to give the animators the flexibility in a shot make a choice of, ‘Oh, it’d be really cool if when Ember grabbed this thing it felt more fire-like.’ What that meant was, say, a very specific tear off could happen or we could make it so the fire didn’t really preserve volume in the same way as other materials. It could really get thin and break off. We wanted to be able to provide those tools.

b&a: In fact, that was kind of a response I had to the film, which in some ways it feels a bit more 2D even though of course it’s not. What are the things that are hard about still animating in 3D but making it look 2D apart from just style transfer?

Sanjay Bakshi: That’s an interesting observation because it’s of course it’s ‘very’ 3D because it’s all volume. Everything is a volume. If you think about all of the things that are behind the pixels, it’s tons and tons of things because even Ember has transparency. There’s scenes where it would surprise us. She would be on fire and doing her thing acting and in one part of her head you could see the thing behind her which is very unique for our films.

But you’re right. I think there is a 2D thing about it. I think part of it might be the style transfer. That’s a very 2D technique. If you move off of camera, it breaks down. When it’s trying to make those stylized flame licks, it only works from the camera.

And the silhouette carving that we’re doing is a very 2D thing as well. If you move the camera, that breaks down. Then there’s the style of animation. The animators are trying to exploit the elemental nature of fire and water and they’re doing that to camera. There is a 2D feel to it.

Another aspect is the stylized lighting. To integrate the characters and the sets, our lighting team are doing a lot of tricks. The shadows are very graphic, much more than we would normally do just to have the set integrate with the stylized character as well. They would use this Kuwahara filter so that things in the distance get really illustrative, again, to integrate the characters into the sets. Probably all of those things are the spice that makes it feel a little bit more illustrative overall.

b&a: It may feel stylized, but I’m guessing it’s just as complicated in the rendering and putting together of these scenes. Was there something you needed to do to manage that complexity from a VFX supe point of view?

Sanjay Bakshi: You hit it on the head, the complexity. I think it’s a compliment you just gave that it’s not apparent when you’re watching the film. I don’t want people to see the complexity and how many hours of rendering time it took and the storage that was required. Hopefully, you’re just engaged in the movie and it feels like a cohesive thing. Volumetric rendering is really, really expensive. Running fire and water simulations is very CPU and GPU intensive. The volumetric neural style transfer all happened on GPUs.

We really ramped up the capacity of Pixar in many of those domains. We had a bigger renderfarm than we’ve ever had to finish this film. We have used much more storage to store the simulations per shot for all the unique simulations. We invested in a GPU farm so we could do the volumetric style transfer. I really appreciated the fact that Pixar made that investment so that we could do the things we wanted to do.

[A production note from the studio: Over 151,000 cores were in use for Elemental. Toy Story had 294 cores, Monsters Inc. had 672 cores, and Finding Nemo had 923 cores.]