Helix 02: Figure AI’s Humanoid Control Trick That Treats Walking And Dishwashing As One Skill

Helix 02 matters because it reframes a classical robotics dilemma into a single engineering decision: should walking and fine manipulation be separate modules or one continuous behavior? Figure AI is betting the answer is the latter. The company’s public demo that walks a robot across a kitchen, opens a dishwasher, unloads and stacks dishes and then reloads and starts a cycle is not theater for its own sake. It is a demonstration of what happens when the control stack is built from pixels down to torque as a unified learned system.

The real significance here is not simply that a robot can perform a sequence of household chores. What actually determines whether this matters is whether that capability is robust across messy, real homes and not just a single filmed run. Most observers assume autonomy improvements will arrive as incremental gains in perception or a smarter planner. Helix 02 reveals a different axis: collapsing balance and manipulation into a motion prior changes what higher-level task reasoning must handle.

What becomes obvious when you look closer at Figure’s architecture is a deliberate division of labor across time scales. A slow, semantic layer decides what needs to be done. A mid-speed visuomotor policy turns vision and touch into whole-body targets. A fast, kilohertz controller enforces balance and contact stability. That hierarchy is the point, not any single clip of a robot turning a handle.

From an editorial standpoint, the most important misconception is the belief that adding more sensors or better cameras alone will close the gap to household usefulness. Helix 02 argues the limiting factor is not sensing fidelity by itself but the interface between perception and the physics of falling, slipping, and stepping. The way Figure trains and stitches those layers together is the story worth unpacking.

Why Loco Manipulation Is The Hard Part

Robotics excels at tidy problems. Arm stations pick up parts on a conveyor. Humanoid legs can walk across flat lab floors. The strain shows when those tidy problems collide. Grasp a bowl and the robot’s center of mass shifts, changing balance constraints. Take a step and the reachable volume for the hands changes. Objects move unpredictably. Cameras get occluded by hands at the exact moment a contact must be made.

Definition Of Loco Manipulation

Loco manipulation is the coupling of locomotion and manipulation into a single control problem where walking, balance, contact, and object handling are treated as simultaneous constraints rather than separate subproblems. This framing highlights why transitions between taking a step and inserting a plate into a rack become failure points for modular systems.

Systems that treat locomotion and manipulation separately often stitch them together with state machines. That can work in structured settings but it creates friction in unstructured environments. The stitch points become failure modes when reality deviates a bit. The continuous approach aims to replace brittle transitions with an always-on coupling that reasons about balance, contact, and manipulation as joint constraints.

How Helix 02 Works From Pixels To Torque

Figure presents Helix 02 as a three-layer hierarchy that operates across distinct time scales. The architecture is organized by intent, reactivity, and stability rather than by hardware boundaries.

System 2, The Task Layer

System 2 is the slow, semantic reasoning layer. It ingests scene information and language, then outputs latent goals that describe desired outcomes such as walk to the dishwasher, open it, or carry bowls to the counter. It does not output low-level motions. Instead the layer hands off an intention to the faster layers that will worry about how to execute while keeping the robot upright.

System 1 And System 0, A Two-Speed Motor Stack

System 1 is the fast visuomotor policy. In Helix 02 it expanded from an upper body policy to whole body control. System 1 consumes head cameras, palm cameras, fingertip tactile sensing, and proprioception, and it outputs joint targets for legs, torso, arms, wrists, and fingers. Figure reports System 1 runs at roughly 200 Hz, which provides fast visual feedback without pretending to be the last word on stability.

System 0 is the new, high-frequency whole body controller. It executes at 1 kHz and is responsible for balance, contact handling, and coordination across the body. System 0 tracks System 1 targets and converts them into joint-level actuator commands that respect contact forces and dynamic stability. The practical claim is that this kilohertz controller serves as a motion prior, freeing the higher layers from encoding low-level balance and contact logic.

How The Layers Interact

Viewed together, the three-layer split maps to tempo and responsibility. System 2 plans over seconds, System 1 reacts to visual and tactile events at hundreds of hertz, and System 0 enforces physics at kilohertz cadence. That tempo separation is deliberate: it lets each layer focus on a different kind of uncertainty without re-encoding the same low-level dynamics.

The Training Scale And The Motion Prior

Helix 02’s System 0 is described by Figure as a learned controller trained from over 1,000 hours of retargeted human motion plus simulation-based reinforcement learning. The company reports training across more than 200,000 parallel simulated environments with extensive domain randomization.

Figure states the controller is approximately a 10-million-parameter neural network. The stated goal is to replace large amounts of hand-engineered control code with a single neural prior. Public figures quote a replacement of over 100,000 lines of C++ with the learned controller. That is not a trivial reduction. Motion priors encoded this way can drastically simplify higher-level behaviors because balance, contact, and typical posture transitions become baked into a stable controller.

Training at that scale suggests a deliberate effort to capture a wide range of human motions and environment variations. What is less visible publicly is the compute and simulation scaffolding required to support such training, and how the domain randomization is chosen to cover the corner cases real homes present.

The Dishwasher Demo And What It Actually Shows

Figure’s headline demonstration is a roughly four-minute, end-to-end autonomous kitchen sequence. The robot walks to a dishwasher, opens it, unloads dishes, carries items to cabinets, stacks them, reloads the dishwasher, and starts the cycle. The company reports 61 ordered actions in that one run and no human intervention or resets.

Why use a dishwasher sequence as a proxy for everyday work? Because it compresses a lot of challenges into one setting. It requires locomotion while holding objects, bimanual transfers, tool use with non-hand surfaces such as using the hip or foot, and long-horizon task state maintenance. The sequence is a stress test for coupling walking and manipulation rather than a showcase of a single dexterous trick.

That said, public demos are bounded evidence. What becomes clear when you look closely is that a single clean run says less about real-world readiness than consistent performance across many kitchens, lighting conditions, clutter densities, and object sets. The demo proves a concept. What determines practical value is how robust that concept is under variation.

New Sensing And The Rise Of In-Hand Feedback

Two sensing advances are central to the claims. Palm cameras provide visual feedback inside the hand when the head cameras are occluded during close contact. Fingertip tactile sensors report force feedback down to roughly three grams, enabling detection of initial contact, slip, and fragile grip adjustments.

With in-hand vision and sensitive fingertip touch, the policy can attempt fine motor tasks such as unscrewing caps, extracting single pills, actuating syringes with precise volumes, and selecting small overlapped components from clutter. Those are the kinds of manipulations that reveal the practical power of closing the loop locally around the contact point instead of relying on global scene reconstruction.

Constraints, Tradeoffs, And The Questions That Matter

Every design choice carries tradeoffs. Helix 02’s unified learned approach shifts complexity away from hand-engineered balance code and into large-scale training and high-rate execution. That produces clear gains, and it creates new practical constraints. Two stand out with measurable implications.

First, compute and power are not free. System 0 runs at 1 kHz, System 1 at about 200 Hz, and both are embedded across whole-body actuation. That implies continuous, high-bandwidth computation on the robot. Because Figure has not disclosed battery capacity or power draw for sustained runs, the real-world duty cycle remains an open variable.

As a rule of thumb in robotics, high-frequency control and high-resolution sensing tend to make power consumption noticeable over a full workday rather than negligible in short trials. Designers will have to trade off runtime versus sensing and control fidelity.

Second, robustness across environments is the gating factor. The demo is reported as a four-minute run with 61 steps, but the company has not published success rates across many kitchens or repeated trials. The tradeoff here is between generalization and training coverage. Training across 200,000 parallel simulated environments and 1,000 hours of motion data provides breadth, but every real home introduces variations in friction, stuck drawers, objects that slip, or lighting that confounds vision.

The practical question is how often implicit error recovery is sufficient versus when explicit fallback strategies or human intervention become necessary. Quantified public metrics such as success rate per attempt, median recovery time, and performance degradation over hundreds of cycles would turn the conceptual advantage into operational clarity.

Other constraints include mechanical safety margins and predictable collision behavior around humans. Safety must be demonstrable not just in a lab but in chaotic domestic settings. Force limits, fail-safe behaviors, and certification pathways are real-world cost centers that will influence deployment speed and scope.

Helix 02 Versus Traditional Modular Robotics

Put simply, Helix 02 trades modular predictability for integrated generalization. Traditional stacks compartmentalize perception, planning, and control into discrete modules with well-defined interfaces. Helix 02 embeds a learned motion prior to handle the messy physics of balance and contact continuously.

Key Decision Factors In Real-World Use

When choosing between a modular stack and a learned whole body prior, teams will evaluate: robustness to novel objects and surfaces, power and compute budgets on the robot, validation cost across task variations, and the ability to reason about safety and certification. Each approach shifts engineering effort rather than eliminating it.

What Determines Whether This Approach Scales

The question of scaling is not only about training more models. It is about whether the learned prior remains a reliable substrate as task diversity grows. If the controller can indeed encode a broad motion prior, higher layers will require fewer recovery heuristics and will be able to compose longer sequences. That is the upside.

The downside is that priors can mask edge cases. A learned controller that favors typical human-like motions might underperform with novel objects or surfaces not seen in training. The solution is systematic exposure to rare events during simulation and careful on robot validation, which increases training complexity and validation cost. In other words, robustness costs money in compute and engineering time.

Where This Fits In The Robotics Landscape

Helix 02 sits at an inflection point in robotics thinking. For years, the orthodox approach separated perception, planning, and low-level control. Figure’s stack collapses that separation by making a learned whole body prior the foundation. This echoes trends in other domains where strong priors reduce downstream complexity, but it also calls for significantly more simulation and data infrastructure.

What most people misunderstand is that the novelty is not simply neural networks controlling motors. The novelty is the system-level choice to force the harder physics problems into a learned controller and to make perception a partner rather than a separate planner input. If that partnership generalizes, it simplifies the rest of the software stack. If it does not, the engineering cost shifts rather than disappears.

Other teams will watch three signals closely. One, how the stack performs when things go wrong. Two, whether energy and compute budgets are practical for real homes. Three, whether the same training process can be efficiently extended to new tasks without retraining from scratch. Those are the operational metrics that decide if this approach becomes a platform or remains an impressive research milestone.

Who This Is For And Who This Is Not For

Who This Is For: Research teams and robotics companies focused on whole body autonomy, labs able to invest in large-scale simulation and motion capture, and integrators who need robust coupling between balance and manipulation for long-horizon tasks.

Who This Is Not For: Projects constrained by tight energy or compute budgets, deployments requiring fully transparent, hand-engineered control logic for certification reasons, or applications where a small set of narrow, repetitive tasks can be solved more cheaply with modular automation.

Concluding Thought

Helix 02 is compelling because it forces a clear tradeoff: invest in a learned motion prior and high rate control now, and simplify task logic later. The dishwasher run proves the idea at a single data point. The next chapters will be written in thousands of repetitions across varied homes and in the decisions teams make about power, validation, and safety. The technology points to a future where humanoid robots treat problem solving as an integrated body scale activity, but the journey from demonstration to dependable household helper will be defined by how the community measures and mitigates the remaining practical risks.

For readers who want to dig into Figure AI’s own presentation of the system, see Figure AI’s Helix 02 announcement.

FAQ – Frequently Asked Questions

What Is Helix 02? Helix 02 is Figure AI’s learned, whole body control stack that couples locomotion and manipulation via a three-layer architecture spanning task reasoning, visuomotor policy, and a high-frequency learned controller.

How Does Helix 02 Work From Perception To Actuation? It uses a slow semantic planner (System 2), a mid-speed visuomotor policy running around 200 Hz (System 1), and a kilohertz whole body controller (System 0) which tracks targets and enforces balance and contact constraints.

What Is Loco Manipulation? Loco manipulation describes control strategies that treat walking and object handling as coupled constraints rather than separate modules, so balance, contact, and manipulation are reasoned about together.

How Much Training Data Was Used For Helix 02? Figure reports over 1,000 hours of retargeted human motion and training across more than 200,000 parallel simulated environments. Public figures also cite a roughly 10 million parameter controller.

Is The Dishwasher Demo Proof Of Real World Reliability? The demo demonstrates concept validity, but the company has not published broad success rates across many kitchens or repeated trials. That limits conclusions about everyday reliability.

Can Helix 02 Run All Day On A Robot Battery? Figure has not published sustained power draw or battery capacity. Given the reported high frequency control and rich sensing, power consumption over a full workday is an open question.

How Does Helix 02 Compare To Conventional Control Stacks? It favors an integrated learned prior that reduces hand-engineered low-level code in exchange for larger training and simulation investment. The tradeoff is between modular transparency and integrated generalization.

What Are The Main Safety Concerns For Home Deployment? Safety concerns include mechanical force limits, predictable collision behavior around humans, validated fail-safe behaviors, and certification pathways. These are practical deployment costs that must be demonstrated beyond lab demos.

Humanoid robot in a kitchen washing dishes while taking a step, demonstrating coordinated balance and arm movement

COMMENTS