The crowd of characters in this music video was performed by one mocap artist

How ‘777’ by Joji was made (featuring a ton of behind the scenes imagery).

I highly recommend you watch the whole music video for Joji’s ‘777’, from production company Pomp&Clout, before reading this story. It’s a stunning piece of art directed by Saad Moosajee and features a wealth of characters within a representation of Heaven.

What’s incredible is that all those characters were made possible via the choreography of a single motion capture performer named Maya Man, who carried out the mocap remotely during quarantine with the aid of some real-time tools.

Then, a visual effects and animation team brought the different mocap performances together, added wings, cloth sims environments and so much more. For befores & afters, several of the contributors to the music video, including Chanyu Chen and Zuheng Yin describe how it was done, from the use of Unreal Engine during motion capture, to Houdini for much of the crowd work.

b&a: What was the brief for this music video—what did the artist/label want to achieve? What kind of boards/concepts/animatics were done?

Saad Moosajee (director): I initially received a brief for the album titled ‘Nectar’ which focused on themes of nature, collective behavior and sources that drive humanity. I was sent the track 777 and was invited to pitch a concept and treatment for what the video could be.

Environment design.
Design for the angel wing dripping look.

777 is the antithesis to 666, it is the holy number and has spiritual significance. I was thinking about spirituality and holiness, while also looking at the themes Joji had sent me for the album relating to nature and humanity, which brought me to Renaissance art. I’ve had a long term interest in Renaissance era figure paintings for many reasons, in this instance I came to them because their dark, fleshy aesthetic to me feels holy and godlike even when all you see is a simple human figure represented.

Many of these paintings are also known as tableau paintings, which is synonymous with the idea of a living picture, but came about from artists pre-photography trying to paint the human form three dimensionally in a two dimensional canvas. So there was a bit of a technical idea also, of using animation technology to create moving 3D figures that felt like tableau paintings.

Motion capture used for previs.

The main previsualization tools were reference footage animatics and storyboards. I would comp together spliced overlays of footage from our performer to imagine different layouts for the characters, and we’d use storyboards to imagine some of the more complex scenes and simulations.

b&a: Tell me a little about the mocap process. Where was this done and how did you manage the process during ‘lockdown’?

Saad Moosajee: We had to coordinate everything remotely, preparing for weeks in advance with our motion capture performer Maya Man. Our animation team, which included myself, Zuheng Yin and Chanyu Chen, along with our storyboarder Zhoutong Qi, would develop previsualization for each of the scenes of the video.

Remotely planned moves.

Maya and I would then discuss options for the choreography and plan out specific movement and timings for each character within the scenes. Because of lockdown we could never rehearse in person, so we prepared remotely and then shot everything socially distanced at Silver Spoon Animation Studios in Brooklyn.

b&a: How, in particular, did you use Unreal Engine to choreograph the mocap?

Saad Moosajee: We knew we wanted to have hundreds of characters in this film, all driven by motion capture, but we also knew that because of quarantine and social distancing we’d never be able to rehearse or shoot with more than one performer. So the question arose–how do you make a film with hundreds of moving characters that feel human and aware of each other, with only one motion capture performer?

Motion capture shoot with Maya Man using Unreal Engine.

In a typical motion capture pipeline, you’d shoot the motion capture for each character, wait to get the data back, and then assemble the different humans in the scene. With only one performer available, this meant we could never get characters to touch, look, or interact with each other properly in isolated takes.

However, using a real-time approach through Unreal Engine, Maya could act out the movements of one character (which would be immediately visualized in 3D scene) and once that character was visual, she would act and dance the next character based on what she’d previously done.

Maya looks at the performances in Unreal.

I likened this to real-time motion capture onion-skinning, where each take represented a character or layer of movement in the scene. Through this approach we were able to achieve complex behavior across groups of dances by layering multiple takes from Maya, all while remaining socially distanced during lockdown.

b&a: How was that mocap data used? What tools/techniques did you use to translate it to CG characters?

Chanyu Chen (animator): After motion capture data was choreographed, shot, and cleaned, it was brought into Maya. Maya was used to retarget data onto the different digital humans in the film, which we then baked out to alembic and fbx for finishing in Cinema 4D and Houdini.

Re-targeting in Maya.

We chose this workflow because it allowed us to take advantage of Maya’s skin clustering and mass preservation when possible through alembic for hero character animation, wherein Houdini the fbx’s would retain rig skeleton information so that visual effects supervisor James Bartolozzi could have more control and freedom in his procedural crowd simulations.

b&a: Can you describe some of the specific Houdini approaches in the music video, such as feathers and wings, fluid sims for honey and vellum sims for crowds?

James Bartolozzi (visual effects supervisor): On the technical side, I needed to improve everything I built for the crowds in ‘Last I Heard’, not only to address workflow issues, but also to address the scale we were going to be working with on this video. We used high resolution human body scans and textures, much more detailed character rigs, and several more accessories like clothing, simulated cloth, and angel wings.

Houdini set-up.

The first steps were improving the dynamic level of detail and agent culling tools. Houdini’s viewport makes working with crowds an absolute breeze, but exporting the geometry to a third party renderer or to caches requires unpacking and transferring a large amount of data.

Some of the improvements include: poly-reducing based on rig-joint proximity to maintain detail where the model deforms the most, independent level of detail controls for layers like helmets and clothes, and more accurate frustum pruning that utilizes agent rig points instead of crowd point locations.


Since the choreography in the video was so meticulously designed, we wanted to be able to control the crowd movement in a less-procedural way. For this I built tools for artists to visually manipulate agent clip transitions. Using these tools we could create ‘wave’-like effects with the crowd clips.

We also utilized some of the built-in Houdini crowds tools to run vellum cloth simulations on top of the crowd animation. Each agent was built with a vellum layer which could be turned on after the crowd animation was done. This way we could dial in the movement before running the sim on top.

Generating wave-like effects in Houdini.

We leveraged several of the features from the HtoA plugin like volume instancing, Arnold Scene Source (.ass), and denoising to achieve the look we wanted. Because we were collaborating across different operating systems, software packages, and software versions, the Arnold .ass files made it easy to share renders, shaders, and geometry caches, between each other and our render farms.

One thing that HtoA was lacking (and Arnold lacks in general) is light instancing. I built a tool that functions similarly to the copy SOP, but takes in Arnold likes as the instance object. It will create and link lights to the ‘master’ light while transforming them to the light template geometry point locations.

Light instancing tool.

Additionally, it can map the template geometry attributes to parameters on the Arnold lights. This way the user can vary any Arnold light parameter using geometry attribute values. Although we didn’t use this tool for final shots, it was great for getting rough lighting in these large scenes with so many characters.

Dan Clark (Houdini artist, Swordfish): To make the honey and Houdini we use a combination of flip simulations and particle simulations. The new boolean toolset in Houdini allowed us to create the final mesh quickly, and for less memory than we would have in a traditional fluid mesh workflow.

To make the wings, we started with a 3rd party digital asset for feathers. We used it to groom and shape the wings. We used vellum on proxy geometry to simulate the gross feather movement, then additional vellum sims for micro hair simulation.

Honey drip set-up.

b&a: There’s such an amazing quality of light in the music video–how was this art-directed, and what were the challenges of realizing the lighting in this way?

Saad Moosajee: For me, something special and mesmerizing about renaissance paintings is that their aesthetic contributes to their narrative. I feel the visual and concept are inextricably linked through their quality of holy light. The most direct example of this is Chiaroscuro, which employs high contrast to shape and represent the form of a figure. Chiaroscuro is as much about light as it is about shadow.

Houdini lighting set-up.

The way typical 3D lighting works, everything is physically based, which makes it difficult to achieve a real chiaroscuro effect because everything gets evened out by bounce light. So I developed a technique that focused around using many lights with hand tuned values and essentially ‘faking’ the light by eye, rather than relying on the physically based output of the renderer.

We would begin each scene in darkness, and then gradually add tons of dialed lights, tweaking the value and angle of each one individually to mirror an authentic chiaroscuro falloff. So essentially it became more like how you actually approach a painting, where it gets built up over time. The only trade off is that it was very slow, and one thing we didn’t anticipate was that in scenes with a lot of human motion, to preserve the effect the many lights we created would need to be animated.

Lights in motion.

One of the animators, Zuheng Yin, ended up individually angling, tuning and animating around 30 of these lights in motion to preserve the effect on the scene where the characters leap and enter into the scene. It was interesting because when the groups of lights were put into motion, the process became more akin to the way stage lights follow characters to keep them illuminated during a theatrical performance.

“777” Music Video Credits

Directed by Saad Moosajee

Production Company: Pomp&Clout
Executive Producer: Ryen Bartlett
Head of Production: Kevin Staake
Producer: Russell Greene

Visual Effects Supervisor: James Bartolozzi
Lead Design & Animation: Saad Moosajee
Lead Design & Animation: Zuheng Yin
Design & Animation: Chanyu Chen
Dance & Choreography: Maya Man
Storyboard: Zhoutong Qi
Asset Design: JD Gardner
Costume Design: Chanyu Chen
Concept Design: Zhoutong Qi

Houdini Artist: Dan Clark
Houdini Artist: Tatsuma Nakano
Producer: Danielle Karstetter

Typography: Min Kim
Texture Painting: Jenny Mascia
Rigging TD: Lee Wolland
CG Generalists: Piotr Gabinsky, Henry Hilaire Jr
Render Support: Kyle Doris

Silver Spoon – Motion Capture
Virtual Production TD: Mahe Dewan
Performance Capture Supervisor: Peter Collazo
Managing Director: Dan Pack
Exec Producer: Laura Herzing
DIT: Kazim Karaismailoglu

Leave a Reply