AI, Not Hardware, Is Why Humanoid Robots Are Finally Viable

AI, Not Hardware, Is Why Humanoid Robots Are Finally Viable

Toyota Research Institute CEO Gill Pratt says AI learning methods — not better hardware — are what finally make humanoid robots commercially viable.

11 min čitanja17. tra 2026.
Marco Ferrari
Marco Ferrari

Gill Pratt, the architect of the DARPA Robotics Challenge and current CEO of the Toyota Research Institute, argues that the long-awaited humanoid breakthrough has arrived — and the catalyst isn't better motors or stronger joints. It's AI. Specifically, the shift from hand-coded robot behaviour to imitation learning and diffusion policy models that let robots learn by demonstration rather than programming.


The Brain vs. Body Problem in Humanoid Robotics

The hardware was never the bottleneck. Humanoid mechanisms capable of impressive physical feats have existed for over a decade — Boston Dynamics' Atlas debuted in 2013, and research humanoids preceded it by years. What lagged catastrophically behind the body was the brain: the software, learning architectures, and reasoning systems needed to make those bodies useful.

Pratt stated this directly in a recent IEEE Spectrum interview: "What's different now isn't the body, but the brain. We have always had this disparity in the robotics field where the mechanisms we were building were incredibly capable, but we didn't really have the means for making the utility of the robot match that potential."

That disparity is now closing — not because actuators got cheaper (though they did), but because AI research delivered a fundamentally new way to program robot behaviour. Instead of engineers writing explicit code for every task, robots can now learn by watching humans demonstrate what to do. This imitation learning paradigm, combined with large behaviour models (LBMs) trained across many tasks simultaneously, represents the core of what Pratt calls the current breakthrough moment.

The parallel to autonomous driving is instructive. The DARPA Grand Challenge in 2004 and Urban Challenge in 2007 didn't produce commercial self-driving vehicles — but they proved the concept, seeded the talent pipeline, and set the trajectory. Pratt designed the 2012–2015 DARPA Robotics Challenge with exactly this logic in mind for humanoids. A decade later, he believes the compounding effect of that foundational work, now supercharged by modern AI, is finally paying out.


Why System One AI Isn't Enough — And What Comes Next

Current AI — including the large language models powering the most capable robot brains today — operates almost entirely in what psychologists call System 1 thinking: fast, pattern-matching, reflexive response. See this input pattern, produce that output action. It works remarkably well until it doesn't.

The missing piece is System 2 thinking: slow, deliberate reasoning that involves building internal world models, imagining hypotheticals, and planning sequences of actions toward goals. Pratt's analogy is blunt. Trying to patch System 1 AI to behave like System 2 is "like trying to squeeze a balloon filled with water; you squeeze it on one side and the water bulges out on the other side." Fix one failure mode, and another appears elsewhere. Net performance improvement: marginal.

This maps directly onto the debate fracturing the AI research community. One camp, represented by scaling advocates, believes current transformer-based architectures can be refined into general reasoning systems. The other camp — most vocally represented by Meta's chief AI scientist Yann LeCun — argues that autoregressive prediction (guessing the next token from past tokens) is architecturally incapable of true reasoning, regardless of scale. Pratt aligns with LeCun: world models, not bigger pattern matchers, are what robots ultimately need.

The practical consequence for today's humanoids is significant. Every impressive robot demonstration you've seen in the last two years — manipulation tasks, household chores, warehouse pick-and-place — is built on System 1 diffusion policies. These robots are reacting, not reasoning. They fail on novel edge cases because they've never imagined the scenario; they've only seen analogues in training data.


How TRI's Diffusion Policy Cracked the Learning Bottleneck

Two years ago, the Toyota Research Institute published work on diffusion policy — an approach that borrows the generative mechanism behind AI image synthesis (diffusion models) and applies it to robot action generation. Instead of generating pixels, the model generates motor commands. The results were striking enough that, as Pratt puts it, "every robotics demonstration that we've seen is using some form of diffusion policy."

TRI then extended this into large behaviour models (LBMs): a single model trained simultaneously across many different tasks, rather than one model per task. The critical discovery was positive transfer — adding new tasks to the training set actually improved performance on existing tasks and reduced the total data needed to reach competence. This directly attacks the data bottleneck that had previously made robot learning impractical at commercial scale.

The data challenge remains real, however. Unlike LLMs that trained on essentially the entire text of the internet, robots must collect physical interaction data — demonstrations, trajectories, sensor readings — in the real world. That process is slow and expensive. LBMs reduce the per-task data requirement, but the industry-wide question of how much data is "enough" for reliable real-world deployment remains open.

Pratt's interim solution mirrors the playbook that finally made autonomous vehicles commercially viable: supervised autonomy. Most of the time, the robot handles tasks independently using System 1 inference. When it encounters a genuinely novel situation — the equivalent of a double-parked car blocking a robotaxi — it raises its hand and asks a remote human operator for guidance. The human provides the System 2 decision; the robot executes it. This hybrid model sidesteps the unsolved world-model problem while delivering real commercial utility today.


The Hype Problem: Why Humanoids in Flat Factories Make No Sense

Not all of Pratt's assessment is bullish. He offers a pointed critique of where humanoid investment is currently flowing: factory floors.

The humanoid form factor exists for one core reason — the human-built world is optimised for human bodies. Doorknobs, staircases, vehicle interiors, hospital rooms — these environments reward bipedal locomotion and dexterous manipulation. Legs genuinely outperform wheels in cluttered, uneven, obstacle-laden spaces because a biped can step over obstacles rather than navigate around them.

But modern factories are flat, obstacle-free, and purpose-built for automation. Wheels are mechanically simpler, cheaper, more energy-efficient, and more reliable than legs in these environments. The humanoid premium — the added complexity, cost, and mechanical failure risk of legs — buys nothing on a warehouse floor designed for forklifts and AGVs (autonomous guided vehicles).

"It's very weird to see so much focus on legged robots in factories, which are flat environments perfectly suited for wheels," Pratt said directly.

This matters for buyers evaluating the current wave of humanoid products. The form factor premium is real, and in many of the environments being targeted by the loudest commercial announcements, it's unjustified by the actual operational requirements. Pratt's own focus at TRI is on environments where humanoids earn their complexity: elder care, home assistance, and other unstructured human spaces where the form factor advantage is genuine.


What This Means for Robotics Buyers

The Pratt thesis has direct purchasing implications. The AI capability tier of a humanoid platform now matters more than its mechanical specifications. A robot with industry-standard diffusion policy integration and LBM-based learning is categorically more capable than one relying on traditional hand-coded behaviour trees — even if both have similar physical specs on paper.

Here is a practical comparison of current humanoid and cobot platforms by AI capability tier:

PlatformAI TierLearning MethodTeleoperation FallbackBest Use Environment
Boston Dynamics Spot (with AI add-ons)System 1+Diffusion policy / behaviour cloningYes (remote ops)Industrial inspection, unstructured outdoor
Figure 02 / 1X NEOSystem 1Imitation learning, LLM integrationPartialStructured manufacturing (limited)
Unitree H1 / G1System 1Diffusion policy variantsLimitedResearch, proof-of-concept
Agility Robotics DigitSystem 1Behaviour cloningYes (warehouse ops)Flat warehouse — wheels are arguably superior
Traditional cobots (UR, Fanuc)Pre-AIProgrammatic / teach pendantN/AStructured, repetitive industrial tasks

Key buyer guidance:

  • Don't buy the body — buy the learning stack. Evaluate what training data pipeline is available, how rapidly the robot acquires new tasks, and whether the vendor supports supervised autonomy fallback.
  • Match form factor to environment honestly. Legged humanoids make sense in unstructured human spaces. For flat, structured environments, evaluate used cobots for sale or wheeled platforms before paying the humanoid premium.
  • The data moat is real. Vendors with the most demonstration data — especially TRI, Figure, and 1X — have structural advantages that will compound. Evaluate vendor data strategy, not just current demo performance.
  • Supervised autonomy is the current best practice. Platforms that support remote operator fallback are more deployable today than fully autonomous systems that will fail on edge cases.

For buyers exploring the full range of available platforms, browse humanoid robots on Botmarket to compare current market options across capability tiers.


Frequently Asked Questions

Why are humanoid robots viable now when they weren't 10 years ago?

The hardware hasn't changed fundamentally — bipedal mechanisms capable of impressive physical tasks have existed since before the 2015 DARPA Robotics Challenge finals. What changed is the AI learning stack. Diffusion policy and large behaviour models now allow robots to acquire new skills from human demonstration data rather than hand-coded instructions, dramatically reducing the engineering overhead per task and improving real-world performance on unstructured inputs.

What is diffusion policy and why does it matter for robotics?

Diffusion policy applies the generative mechanism behind AI image synthesis to robot action generation. Instead of producing pixels, the model outputs sequences of motor commands. Toyota Research Institute's work on diffusion policy, published in 2022-2023, demonstrated that this approach outperformed prior imitation learning methods on manipulation benchmarks and has since been adopted — in various forms — by virtually every major commercial humanoid developer.

Should I buy a legged humanoid robot for my warehouse or factory?

In most cases, no. Gill Pratt explicitly notes that flat, structured factory environments are "perfectly suited for wheels," and the mechanical complexity of legged locomotion adds cost and failure risk without a corresponding operational benefit in these settings. Wheeled cobots or mobile manipulators on wheeled bases are typically more cost-effective and reliable for structured industrial applications. Humanoid legs earn their premium in unstructured environments — homes, hospitals, outdoor spaces — with steps, obstacles, and human-scale affordances.

What is the difference between System 1 and System 2 AI in robotics?

System 1 AI (fast, pattern-matching) covers what current robots do: map sensory inputs to actions based on training data. System 2 AI (slow, deliberative reasoning) would involve building internal world models, planning multi-step action sequences, and imagining novel scenarios before acting. Current humanoid robots operate almost entirely in System 1. No commercial robot platform has achieved robust System 2 reasoning, and this remains the field's central unsolved challenge.

What does supervised autonomy mean for robot deployment?

Supervised autonomy is a hybrid operational model where a robot handles the majority of tasks independently but escalates to a remote human operator when it encounters situations outside its training distribution. This is the same model used by commercial robotaxi services when vehicles encounter edge-case road situations. For buyers, it means reliable current-generation deployment is achievable — but factor in the cost of remote operations infrastructure and human oversight staffing.

Is the current humanoid investment bubble going to result in useful products?

Pratt's view is cautiously affirmative: something genuinely different has occurred, the AI breakthroughs are real, and the investment is producing accelerated capability development. The risk he identifies is misallocated applications — specifically, deploying humanoid form factors in environments (flat factories) where simpler platforms would outperform them. The investments most likely to produce durable value are those targeting genuinely unstructured environments — elder care, home assistance, disaster response — where the humanoid form factor has an irreducible advantage.


The brain-body gap in humanoid robotics is closing — but the gap between capability and appropriate deployment is widening just as fast. The platforms getting funded now will define which companies own the embodied AI stack for the next decade.

Which humanoid platform's AI learning stack do you think is most defensible — and does supervised autonomy solve the commercialisation problem or just delay it?


Last updated: 2025

Povezani članci

Pridružite se raspravi

Is supervised autonomy a real commercialisation path for humanoids, or just a way to mask unsolved AI limitations?

Više članaka

🍪 Postavke kolačića

Koristimo kolačiće za mjerenje učinkovitosti. Pravila privatnosti