‘If this is going to work, we really have to cover Paul Thomas Anderson shots’

The co-founder of Wonder Dynamics breaks down how the AI VFX tool Wonder Studio was made for storytellers.

Almost immediately after Wonder Dynamics unleashed their AI-driven visual effects platform Wonder Studio to the world as a closed beta, key users began playing with the tool. They’d film themselves with an iPhone and then show how the tool replaced them–seamlessly–with a CG robot. Or they’d take existing film, music video or even stuntvis footage, run that through Wonder Studio, and share the impressive results of filmed scenes that were never previously intended to feature a CG avatar.

For VFX artists, Wonder Studio has sparked a huge amount of interest since it seemed to be handling so much of the time-consuming work usually involved in visual effects plate preparation, rotoscoping, lighting and compositing. And, doing it somewhat magically with the aid of AI.

Indeed, the current incarnation of Wonder Studio involves users uploading live-action footage to the cloud, where the platform is able to recognize ‘actor’ movements, map these actions onto a robot or other CG character avatar, and composite that avatar into the same live action background, all while also painting in any background occluded by the original actor, and matching the lighting from the original plate footage. This is all made possible via several AI techniques and R&D harnessing markerless motion capture, tracking, roto, matchmoving, background replacement, lighting and compositing.

Already, scores of ‘fun’ video tests and near-complete short films have emerged and been heavily shared on social media. The results are often not perfect, but Wonder Dynamics’ goals with the tool are nothing short of lofty: democratizing the sometimes laborious visual effects process for content creators, all the way from beginners to pros. The stated aim is to do this for complex shots (ie. Paul Thomas Anderson ones), not just static or ‘easy’ plates, and to allow content creators the ability to edit and enhance the results.

In this excerpt from issue #11 of befores & afters magazine, co-founder and VFX artist Nikola Todorovic (the other co-founder is actor Tye Sheridan of Ready Player One fame) tells befores & afters more about the history of Wonder Studio, how it works, and where the team plans to take it in order to deal with ‘PTA’ scenes.

b&a: You must be excited about all the attention Wonder Studio has been getting, especially with people posting their tests and films online.

Nikola Todorovic: It’s actually something we didn’t quite anticipate because, especially as a VFX artist, all I see are the issues. So all I see are the little details, you know, ‘Hey, we’ve got to fix that…’. But we’ve been building it for over three and a half years, and we didn’t want to launch with a plugin. We really wanted to give an end-to-end solution for users.

The first time we saw early results, I had already had that ‘mind blown’ moment. But then as we worked on it, you lose that feeling a little bit. So, it’s exciting to see people having that moment now. It’s like a movie you’ve worked on for a long time and you don’t really know if it’s good anymore, because you’ve seen the cut a million times.

b&a: Can we go back to the beginning–what was that first incarnation of the idea for Wonder Studio?

Nikola Todorovic: Honestly, we stumbled upon AI really coincidentally. The company has existed since 2016, and our first product was more interactive in nature, where we built a tool that let you switch between 2D and 360 and 2D and 3D on your TV or any device. We wanted to get rid of that friction of interactive content. Then we partnered with a startup that was doing conversational AI where you can talk to characters. We combined the two products and we made this switch between 2D and 3D where you can have a conversation with your character.

At the time, we used visual AI, well, we called it visual AI, for recreating that scene. And that’s when we had a light bulb moment. We said, ‘Interactive content is cool, but it’s niche.’ We were not that excited about trying to force people to watch interactive content on TV.

There was no generative AI term back then but we knew this visual AI thing was going to change everything, how we make movies. At the time, Tye and I were writing together, we’d met on a film set about 10 years ago. Every time we wrote something, for me especially, I would write a lot of projects with robot characters, and I realized, well, I need to be a super-established director to get $200 million for this.

That’s where the inception really happened. We (selfishly) started because we wanted to make films that cost way less, but they look much bigger, and they have CG leading characters. And then we realized this was bigger than us, let’s make it into a platform. We said, let’s share that with people and storytellers like ourselves who just don’t have that backing from big studios to make films.

b&a: What are the specific AI sides of Wonder Studio? I’m guessing it’s about automatic tracking, roto, background removal and replacement, and then the motion capture side. But I haven’t actually seen too much discussion about some of the training side of this as yet.

Nikola Todorovic: It is all of those elements, plus lighting and compositing is really a big one. That’s something we built from scratch because we didn’t see anything on the market, even open source, that could have helped things. Nebojša Nešić and Maksim Ljubenkovic on our team are geniuses who figured out this lighting and compositing. I’m still baffled by how they did such a good job with it!

But the thing is, we approached it from a film and VFX standpoint from day one. You’re seeing a lot of AI solutions that come from research that really is more tailored towards social media and more tailored to gaming. So a lot of training has been done with data, where the cameras are shooting the whole body, and you have this typical mocap angle of shooting things.

We went with an approach where we said, ‘If this is going to work, we really have to cover PTA (Paul Thomas Anderson) shots.’ So I always say, ‘PTA Boogie Nights shots.’ This just means we had to be able to cover shots that are really complex.

We started with robotics and self-driving car technology because it was all about computer vision. If you’re going to build a robot and you want a robot to make your sandwich, that robot needs to understand what a sandwich looks like, but also needs to understand human motion and it needs to understand depth.

So, we saw it as an opportunity to understand the image. We call it scene understanding. The idea is, the more it can understand what’s happening in a scene, the more it can extract that in 3D space. It was super important not to break the pipeline of 3D and CG that we usually work in. So, when it comes to AI training I said to the team, ‘Give it a film use case and let’s make it learn that framing and that lighting and that part of a shot.’

The thing is, that means there are going to be hard shots, medium shots and easy shots. Easy and medium are shots we cover now. Some hard shots will be first-pass type shots. But the idea was, you can get these elements and you can tweak them.

Read more in issue #11.