How a tracking algorithm helped insert Tom Hanks into news scenes with historical figures in ‘Forrest Gump’

A look back at the early days of tracking and matchmoving at ILM for the film.

Robert Zemeckis’ Forrest Gump (1994) might be the most famous ‘invisible effects’ film in history, thanks to the efforts of Industrial Light & Magic in realizing a number of scenes of Gump (Tom Hanks) touching key moments in American history. Among other things, Gump meets presidents and famous musicians–these scenes are showcased often via seamlessly inserting Hanks into old newsreel footage.

ILM had been pioneering new ways of VFX creation, especially in inputting and outputting from and to film, and using CG animation and digital compositing. One particular challenge the studio had on Forrest Gump was tracking the often jittery news footage into which Hanks would be inserted, partly because it tended to be old 16mm footage and also was constantly moving or subject to regular zoom in and outs.

That’s where JP Lewis came in. He’d recently joined ILM after studying at the famous NYIT CGL, and one of his early duties at the visual effects studio was to implement an efficient and reliable tracking algorithm into ILM’s existing toolsets. Lewis later worked at other studios like ESC, Wētā FX and Disney, and at companies such as Google and NVIDIA.

Here, on the 30th anniversary of Forrest Gump, befores & afters asked Lewis specifically about the tracking algorithm work. The film, of course, won the Oscar for Best Visual Effects (awarded to Ken Ralston, George Murphy, Stephen Rosenbaum and Allen Hall).

b&a: How did you come to be working at ILM in 1993? What had you been studying or working on right before that?

JP Lewis: I had started my career at NYIT CGL, an early graphics lab that had a lot of pioneers (Jim Blinn, Ed Catmull, Alvy Ray Smith, Jim Clark, Fred Parke), though most had left before I joined. Lance Williams and Pat Hanrahan were still there, but left shortly after they realized that I had joined.

b&a: Tell me about the tracking algorithm you worked on for Forrest Gump? What needed to be ‘solved’ at the time, and how had VFX studios been doing that up until then?

JP Lewis: ILM was developing iComp, an in-house compositing program, and a normalized cross-correlation template matching algorithm was added as part of it (I believe implemented by Jeff Yost or Brian Knep). The problem was that it took many hours just to track a single point in a single shot. This was on expensive (and slow – about 50mhz) SGI machines. ILM didn’t have many, and so a more efficient solution was needed.

Cross-correlation can be implemented in the Fourier domain, with increasing and sometimes dramatic relative speedup when tracking larger images. However, Fourier domain convolution does not give the normalized form of cross-correlation needed to do useful tracking. I realized that a precomputed running-sum table (previously used in graphics in Frank Crow’s summed area tables paper) could be used to efficiently add the needed normalization to the Fourier-domain approach.

b&a: What tools were available and used for this job at the time, and how did you implement your algorithm into the workflow at ILM?

JP Lewis: The algorithm was initially implemented in Repo, a match-move program written by John Horn (lead author) and myself. I also added a limited ability to track rotation, scale, and to do corner-pin perspective homographies using four tracked points.

b&a: Can you recall any particular shots your algorithm was used for on Gump? What made these shots particularly tricky to track or deal with?

JP Lewis: It was used on a number of shots on Forrest Gump. The shots where Tom Hanks was inserted into news scenes of historical figures (Martin Luther King, Kennedy, Nixon, etc) were challenging, since they had motion blur caused by the reporters moving their hand-held cameras to find better views. The same motion blur had to be applied to Tom Hanks to make him fit more seamlessly into the scene. Tracking relatively large regions was required due to the motion blur, and handling rotation and scale was sometimes needed.

b&a: How was your research packaged up and used for further projects at ILM, or in any software tools, after Gump?

JP Lewis: After I left ILM I re-implemented it for the Commotion roto tool developed by Scott Squires, where it was actually fast enough to track in real-time in some cases. At ESC the algorithm was the 2D tracking component of the Labrador matchmove tool (developed by Dan Piponi and Doug Moore), used on The Matrix sequels. I heard that ILM was still using it in the early 2010s as a building block for some higher-level tracking techniques.

At ILM I had also published the algorithm. The paper and an extended tech report have gotten about 3000 citations including in remote areas like medical imaging and astronomy. It has been independently implemented in some software such as Matlab and one of the Python image processing libraries. The paper also introduced the running-sum trick in computer vision, where it later became known as ‘integral images’.

I had a follow-up version that combined correlation surfaces at multiple scales, thereby disambiguating false matches. It worked remarkably well in one use, but I’ve never had time to pursue it.

algorithm paper link: http://scribblethink.org/Work/nvisionInterface/nip.pdf

commotion: https://www.toolfarm.com/news/toolfarm-throwback-remember-commotion/

b&a: Did you keep an interest at all in tracking (in particular) after this work? What would you say about the development of tools in this area that you’ve seen occur since the early 90s?

JP Lewis: There have been several very promising neural tracking papers in the last year or two, and other papers that can do ‘semantic’ correspondences (e.g. Tang et al., Emergent correspondence from image diffusion) could have some interesting and unusual uses in tracking I think.

Perhaps the only current limitation with neural techniques is just resolution–in computer vision research 512×512 is sometimes regarded as ‘high resolution’, so a practical solution at the moment might require using neural tracking as an initialization or constraint paired with a more accurate (but less robust) traditional method. I think the ultimate approach to tracking will be to not treat it as a problem separate from understanding the scene as a whole. A system that understands 3D and even object categories can provide better high-level disambiguation for tracking, and vice-versa, tracking informs this understanding. The progress in AI suggests that this will probably be achievable.