The filmmaker talks to befores & afters about venturing into vol-cap and real-time.
As many befores & afters readers will know, director Neill Blomkamp has a visual effects background. This may go some way to explaining the innovative VFX methods he has tended to use for bringing films such as District 9, Elysium and his Oats Studios shorts to life.
With Demonic, the director has taking things a step further, embracing volumetric capture for ‘simulation’ or dream scenes in the horror pic, releasing in theaters and on VOD August 20. For those dream scenes, actors were captured with a multiple camera-array, and the resulting volumetric point cloud data used to render their appearances. The volumetric scenes are the longest ever so far in a feature film.
Capture services were provided by Volumetric Camera Systems. Viktor Muller from UPP was Demonic’s visual effects supervisor (and an executive producer on the film). In addition, working with Unity, the filmmakers employed the game-engine’s new technology called Project Inplay (the current code-name) which allowed for volumetric point cloud data to be brought into the engine and rendered in real-time.
befores & afters was able to sit down with Blomkamp to discuss the volumetric tools used on Demonic, what it meant for obtaining performances, and the director’s thoughts on how vol-cap could form part of future projects.
b&a: What was it about volumetric capture that interested you, and why did you feel it worked for this story in particular?
Neill Blomkamp: Well, coming out of VFX as a background, I think that I’m just very intrigued by things that are related to computer graphics. I’m very, very interested in 3D environments that you can drop users into, which is probably why I like games.
I like the idea of immersion inside a 3D environment more than the game itself, often. And I think as resolution increases and the physics become more real, the more interesting it becomes. There’s something about that idea of simulation that is just fascinating.
So, volumetric capture triggered some sort of similar thing in me, I don’t know what it is, where to three-dimensionally capture your actors as these chunks of moving geometry with RGB data, all of the textures that were gathered stuck onto them, is just a very interesting, cool thing. But it’s also a very new and has a bunch of issues related to it, which I’m sure will get solved.
I knew that I just loved it, and I wanted to use it somehow. I started speaking to Metastage in Los Angeles about two or three years ago, about how does this work, and how would we use it in the film? And mentally, I just filed it away in my head as, ‘I want to figure out how to use that.’
Now, Oats Studios would be a perfect platform for this, we could put that into a short easily, where you don’t need any justification. You could just do it. And then so when the pandemic hit, and it was a case of us wanting to shoot our own self-financed, small horror film, to just do something, I resurrected this notion of, let’s figure out how to use volume capture in a two-hour narrative film and see if there’s a way to do it.
b&a: What were the first things you had to solve by shooting scenes volumetrically?
Neill Blomkamp: The obvious immediate issue is the fact that it’s glitchy, because it’s so new. I figured, then, you would have to write it narratively into the story, in a way that that was justified, where the technology would have to actually be a prototype in the story. And so, that’s what I did. I just went down the road of incorporating it into the story, in a way that it’s actually fundamental to the story. And then there wouldn’t be an issue with resolution problems or glitches that the audience may be confused by, if you were trying to play it like it was normal VFX.
The way to think of it is basically like photogrammetry of a single object and then you let the computer extrapolate a three-dimensional object out of that. It’s not just the object. It’s also all of the RGB data that’s there too, and all of the shadows and imperfections and everything. They come with the object. It’s all of that, 24 times a second. So it means that all of the meshes have nothing to do with one another. They’re all individual calculations that are all individually existing as a piece of clay, one frame at a time. In a way, it’s like old-school animation, where you’re just hiding and un-hiding different objects over a 24-frame cycle per second. And then you just keep doing that.
“The idea of taking that and dropping it into Unity, and then having it live in a 3D real-time package where we could just watch the vol-cap play, and we could find our camera angles and our lighting and everything else—I mean, I love that stuff.”
So that’s weird on one hand, because you can’t adjust those objects. They are what they are. They’re just baked into this object. There’s nothing you can do. You can’t grab their hand and assume that there’s some kind of IK rig that you can move around. It’s like, there’s none of that. It’s just an object.
And then the second issue is, there’s no surface differentiation. So it’s one map that’s just you feed onto a bunch of crazy geometry. And it looks good when you look at one image but if you look at the UV file, it would give you an aneurysm to look at it, because it’s so scattered and crazy. And there’s also no surface differentiation on the object, in terms of reflection, or matte surfaces, or translucency, or subsurface. They’re all just one thing.
So, it has a way to go before people start actually using it in a normal VFX sense. But on this movie, the idea of taking that and dropping it into Unity, and then having it live in a 3D real-time package where we could just watch the vol-cap play, and we could find our camera angles and our lighting and everything else—I mean, I love that stuff. That’s exactly what I wanted to be doing, and the narrative of the movie allowed for it.
b&a: I feel like I’d seen some fun volumetric capture in, as you say, shorts or commercials, or very small segments of films. But here, you seem to be using it for long sequences where you need to direct actors. You’ve used motion capture and faux capture and lots of other different ways of getting your actor’s performance, but what were the challenges here, of getting the performance volumetric capture-wise
Neill Blomkamp: We were lucky to find Volumetric Capture Systems (VCS) in Vancouver. They built the rig, which was 265 4K cameras on a scaffold. It’s usually a hemisphere, but we needed more room on the side. It was actually a cylinder. Then on top of that, we had these mobile hemispheres that were a meter wide, with 40 or 50 cameras in them, that would be brought in closer for facial capture.
The truth is I couldn’t imagine a worse environment to put actors in if I tried! I mean, I guess the only other thing you could do is maybe to add water. If they were semi-underwater, maybe that would be the only thing that would make it worse. So hats-off to the actors Carly Pope and Nathalie Boltt for doing awesome work in that insane environment.
The other thing that was extremely weird to get your head around, for me at least, was, there was no clear way to observe the performances other than witness cameras. So you’d have witness cameras with camera operators who were moving and trying to follow the actors, and then I just got the feed from those cameras. Because, obviously, all of the other 265 cameras, they’re just static, and they’re recording wherever the actor is at that moment in the frame.
That means you don’t get any feedback from the volume capture rig, and you certainly don’t have a virtual camera, because the data hasn’t been calculated yet. You’re just sitting around like a stage play, basically.
On a mocap set, that’s different. Because in mo-cap, you grab a virtual camera, and you shoot it yourself, with the actors there or not there, it doesn’t matter. With vol-cap, what ended up happening after many months of crunching down the data, we could load it into Unity. And then we have this awesome real-time environment, where we could bring in virtual cameras, and then we could just look at it. Then it’s almost leapfrogged normal motion capture, because now all of a sudden what you were looking at was final. Everything is final. So now it’s just a question of like, well, where do you want the lights?
You have nothing to start with, and then suddenly you have a final character. I mean, you’re not assigning it to a rig. You’re not assigning a 3D mesh to the rig. There’s no retargeting. There’s no morph targets. It’s just comes in, and it’s done, so that was pretty cool.
The data management and logistics was an absolute goddamn nightmare too, because 4K cameras times 265 times 30 minutes of footage—I think we were at 12 to 15 terabytes of downloads per night. So we actually had to supplement VCS’ computers. I think we brought in 24 computers of our own, to the set, just so that they could start shooting the next morning.
b&a: I’m always curious with filmmakers who have a background in VFX like yourself, do you ever get time—and do you want to—still sit on the box and do some shots and play around with Unity, say?
Neill Blomkamp: Well, the real-time stuff, with Unity, say, is probably the most interesting to me. I was saying how I love 3D environments. And if you look at tools like Quixel Megascans, and synthetic foliage that you can grab and drop into stuff, and real-time simulation, and real-time radiosity, and real-time ray-tracing and reflections, then you can sit in these environments and build them.
And also when you can include audio, in fact, this is where it starts to touch on games a little bit. But when it’s an actual immersive experience for an audience member, I’m completely obsessed with that stuff.
Photogrammetry actually ties into that in a way that I personally like, which is as an aesthetic artistic choice. I like all of the errors and realism that come with photogrammetry. So if you photogrammetry an old barn, you’d get all of this cool, awesome, grungy texture, that it doesn’t matter how talented the artist is. They never build something that looks that real.
So absolutely the answer to the question about wanting to be on the box is, yes. And I probably should learn something more like Unity than traditional 3D packages. I have just started messing around with Cinema 4D actually, just to know that piece of software.
b&a: You mentioned there’s a ‘look’ of volumetric capture that’s apparent right now, a glitchy look. When you got into the data and were finalizing shots, where did you settle on with the look?
Neill Blomkamp: To be honest, I’m not sure that you could alter it really at all. I mean, literally, without treating it like a VFX cleanup, I don’t know how you would alter it. So we had to go into it, knowing that it was going to be glitchy. If you think of photogrammetry and you think of a still object, you can obviously get an incredibly crisp, amazing extraction of that object. If you walked around an old farm tractor with a Canon Mk III, and you shot 600 photos from every angle, and you gave it to RealityCapture, and you let it build you a model, and give you all of the RGB data, it would look really good. It would look super crisp.
So you could, theoretically, if a person was sitting in a chair, and you brought in the cameras within say a few centimetres of their face, and you just caked them in cameras and recorded it as video, it would look super high-res. You may even see individual strands of hair.
But going one centimetre from a person, versus 10 centimeters, versus one meter, is an exponential drop-off in resolution. Because if you think about their size in the frame, the drop-off is ridiculous. So by the time that you’re in a space large enough for an actor to barely move, which was our four by four-metre cylindrical volume space, their resolution drops off at what looks like an exponential curve.
So, we knew. I mean, we shot tests, and we knew roughly where we would be with resolution. But we all knew it would have the glitchy, low-res look. I mean, we knew it would look super cool—as far as I’m concerned—I love that look. And we knew it would be justified within the story. We weren’t trying to fool anyone. We were saying in the movie that it’s nascent, prototype, early development, VR technology for people who are in comas, or quadriplegic situations. So I think in the context of the movie, it works.
But it’s almost the only setting where I could imagine it working other than holograms on billboards. That will change, though. As that resolution increases, people will use it more, I think.
b&a: I also think filmmakers like yourself doing this can drive things forward. What are the things you think that might help next time if you use volumetric capture? One of the things you said was you couldn’t necessarily see the performances in real-time, because it all needed processing.
Neill Blomkamp: It’s just a computational thing. We’re probably not far from Moore’s Law running out. I mean, I don’t know how many more years there are before we figure out quantum closeness in terms of circuitry. So I don’t know how much more processing power really there is left to go.
The chips are pretty good, and it still requires this insane amount of time to just get the stuff. So even if that comes down 100 times, or 1,000 times, it still is nowhere near real-time. But I don’t think that matters, though, to be honest. I don’t think it matters. I think once you’re in something like Unity later with the data and you can actually play with it, I think that’s where it matters more.
The whole point of real-time cinema and virtual cinema is to be able to be freed from the constraints of day-to-day production; when the sun is setting, or there’s a rain storm coming, or you lose the extras at 5:00 PM, or whatever the issue is. The whole point of virtual production is that you can be in a quiet, controlled post-production facility and load up your three-dimensional stuff in something like Unity, grab a virtual camera, and take your time, over weeks, if you want, to dial it in exactly the way that you want. So in that sense, I don’t really think it matters that there’s a delay between gathering your vol-cap data and then crunching it down to 3D, so you can drop into your real-time environment.
I think what does matter, though, and what is a huge issue, is how you capture it, in this highly restrictive, absolutely insane way that it currently has to be done. That’s what will change.