move

The question everyone asks

How long will it take to process my animation? It's one of the first questions anyone asks when evaluating a motion capture system - and it's a reasonable one. Post-production schedules are real. Deadlines are real. Nobody wants a pipeline that creates a bottleneck - and with Genesis processing running in minutes to hours, it doesn't.

But the answer depends entirely on what your system is actually doing in that processing step. And once you understand that, the relevance of the speed comparison looks very different.

Where the question comes from

To understand why processing speed became the default benchmark, it helps to understand how traditional marker-based systems work.

Optical systems don't process video. The cameras in a Vicon or OptiTrack setup track the 2D centroid positions of reflective markers in real time, outputting a sparse stream of XYZ coordinates - one per marker, per frame - transmitted directly to a connected PC. Nothing is written to the camera itself. By the time data reaches a post-processing solve, most of the interpretive work is already done: the system has decided where each marker is, filled gaps, and labelled which marker belongs to which body segment. Post-processing is essentially cleanup on a small dataset. It's fast because the scope is narrow.

That architecture made practical sense when recording and transmitting raw video at scale was genuinely difficult. Processing speed became a meaningful metric because the data being processed was always similarly constrained. The comparison carried over into how the industry evaluates all mocap systems - including ones built on entirely different foundations.

The Genesis Pipeline

Genesis starts from raw video. Every frame, from every camera, at full resolution and frame rate, is recorded to on-board storage and analysed to build a complete picture of what happened in the volume.

On a typical Genesis shoot, twelve cameras at 4K/100fps can generate hundreds of gigabytes of raw footage per day. The processing pipeline runs computer vision inference across all of it, correlates observations across every camera view, and produces a coherent skeletal solve from first principles - without markers or suits.

That's a comprehensive task. It takes longer than cleaning up a sparse dataset - because it's doing something fundamentally different. The processing time is the cost of working from the richest possible signal rather than a simplified abstraction of it. And that signal is what makes everything that follows more capable, more robust, and more valuable.

What a comprehensive pipeline offers

Genesis's real-time output is a preview - useful for on-set confidence, but not the deliverable. The actual solve happens in post, working from the raw video: every pixel, every frame, every camera. Video gives the solver texture, silhouette, shadow, and contextual evidence from every camera simultaneously. The model reasons about what a limb is doing based on genuine visual evidence, not statistical extrapolation - and it isn't looking at each keypoint in isolation. It's trained to understand skeletal keypoints in the context of the full body around it. An ambiguous hand position becomes far less ambiguous when the model can see the complete postural context of the performer. The processing time is the time spent doing justice to the performance.

More meaningful tools for correction. With Genesis, correction happens in the context of the original video. If the automated detection has trouble with a particular moment, you can correct the 2D keypoint detections directly against the footage - seeing exactly what the camera saw, frame by frame. Those corrections feed back into the downstream solve, compounding the improvement rather than patching over it.

The capture is complete, even when the network isn't. In an optical system, marker data streams continuously from camera to a central server. If the network drops mid-take, that data is gone - there is no recording on the camera to fall back on. The pipeline has nothing to process. Genesis cameras write video directly to on-board storage as a primary recording, not a backup. The integrity of the capture never depends on network stability, which means the pipeline always has a complete record to work from - regardless of what happened on set.

Performances that improve over time. With Genesis, the raw video is always there. Re-run the entire pipeline with improved algorithms, newer models, whenever you need to. Clothing dynamics, facial detail, hand articulation - these are all tractable from video. And because Genesis's solve is a learned model, every engine update retroactively improves performances you've already captured - including capabilities that don't exist yet. A solve that was the best possible result today may be significantly better in six months, from the same footage.

‍

The hardware direction

Camera capability is increasing. On-board storage is getting cheaper. Bandwidth is growing. These aren't background trends - they're the infrastructure that makes the markerless approach more capable with every hardware generation.

The constraints that once made a sparse, centroid-based architecture the pragmatic choice are disappearing. Genesis is built around the richest possible input from the start - which means every improvement in camera hardware, storage density, and compute power compounds directly into capture quality and pipeline capability. The markerless approach doesn't fight the hardware curve. It scales with it.

The future isn't tracking dots. It's tracking people.

So what does the processing time question actually mean?

A pipeline that processes quickly does so because its input is constrained. That's not an advantage - it's a scope constraint. The question worth asking isn't how fast the processing runs, but what it's working from and what it can produce.

Genesis post-processing takes longer because it's solving a harder problem from richer data. It happens during background compute time, on a timeline that fits naturally into how productions already operate. And the result is a solve that is more accurate, more correctable, and more capable of improving as the technology evolves.

The processing time is real. What it reflects is a pipeline built for capability and robustness - one that works from the richest possible signal, with the most redundancy, and gets better over time.

‍

The Genesis Value - Part 2: Your Capture

The question everyone asks

Where the question comes from

The Genesis Pipeline

What a comprehensive pipeline offers

The hardware direction

So what does the processing time question actually mean?

More Insights

The Cinematic Vision of Ilya Nodiya: From Photography to 3D Storytelling

Sony Music Brings the Future of Virtual Concerts to Fortnite with Markerless Mocap

Move AI Kicks Off Mocap Meet Up World Tour in Tokyo, Japan

Ready to get started?

Get in touch with the team