AutoDex: Fully Automated Dexterous Grasping Data Collection at 75+ Trials Per Hour

AutoDex: Fully Automated Dexterous Grasping Data Collection at 75+ Trials Per Hour

Mingi Choi, Gunhee Kim, Jisoo Kim, Taeksoo Kim, Taeyun Ha +2 autres

8 min de lecture23 juin 2026

AutoDex is an end-to-end autonomous system that collects physically labeled dexterous-grasp trials without any human intervention—generating 3,593 real-world grasp attempts across 100 household objects. By automating object pose estimation, collision-safe execution, success/failure labeling, and scene reset, AutoDex achieves 75.5 trials per hour, nearly 4× faster than teleoperation.

What the Researchers Built

AutoDex is a complete hardware-software pipeline that turns simulated grasp candidates into physically validated, labeled trials on real multi-finger hands (Allegro and Inspire). The system runs entirely unattended: it estimates the object’s 6-DoF pose using a dense 20-camera array, filters and selects executable grasps from a modular candidate generator, executes the grasp on a physical robot arm, checks lift-and-hold success (5 cm lift, 3 s hold), labels the trial, and resets the object for the next attempt.

Three key innovations make this possible. First, a dense multi-view perception system overcomes hand–object occlusion during grasp execution, maintaining reliable pose tracking even when the robot hand covers most of the object. Second, a residual-torque safety monitor detects unexpected contacts and aborts unsafe motions, allowing unattended operation without risk of damage. Third, an active object reset module uses a second robot or manual reorienter to move the object between stable poses, ensuring that the candidate set is exhausted across all orientations. The collected database includes synchronized robot state logs, multi-view video, camera calibration data, and per-trial success/failure labels—all generated autonomously.

System diagram showing the AutoDex loop from pose estimation to grasp execution to labeling and reset

Key Results

The researchers evaluated AutoDex on a 20-object subset drawn from a 100-object database spanning plastic, metal, wood, silicone, paper, tape, and ceramic items. The primary metric is autonomous throughput versus teleoperation. AutoDex achieved 75.5 trials per hour, while a skilled teleoperator managed only 19.3 trials per hour—a 3.9× improvement. This gain comes not from faster execution (mean loop time is 48.2 s, dominated by robot motion) but from eliminating human idle time and enabling 24/7 unattended collection.

Physical validation dramatically improves the quality of the resulting grasp database. When the researchers tested a downstream retrieval-based execution policy, grasps screened by AutoDex’s real-world trials had a 79.2% success rate in new scenes, compared to only 18.3% for grasps selected purely by the candidate generator (simulation-only). The active object reset module increased coverage: without reset, the system collected trials from an average of 2.3 stable poses per object; with reset, it covered 5.7 poses, nearly tripling the candidate space explored.

MetricAutoDex (Autonomous)Teleoperation (Human)
Throughput (trials/hour)75.519.3
Mean loop duration (s)48.2
Downstream success rate (physical validation)79.2%
Downstream success rate (simulation-only)18.3%
Stable poses covered per object (no reset)2.3
Stable poses covered per object (with reset)5.7

How It Works

AutoDex operates in a closed loop consisting of five phases. First, pose estimation: a 20-camera rig captures synchronized images, and the system runs an off-the-shelf 6-DoF pose estimator to localize the object on the tabletop. The high camera density ensures that at least two cameras have an unobstructed view even as the robot hand approaches, maintaining tracking accuracy during the critical pre-grasp phase.

Second, candidate selection: a modular grasp generator (e.g., GraspIt! or a learned model) produces a set of wrist poses and hand configurations. AutoDex filters these using a collision checker against the estimated object pose and known scene geometry (table, obstacles). It then selects the highest-ranked feasible candidate that hasn’t been attempted for the current stable pose.

Third, execution with safety monitoring: the robot arm plans a trajectory to the pre-grasp pose, closes the fingers, then lifts 5 cm and holds for 3 seconds. During the lift, a residual-torque monitor runs on each joint: if the measured torque exceeds a preset threshold (indicating unexpected contact, e.g., with the table or a dropped object), the system aborts and retracts to home. This monitor is only active during contact-critical segments (near tabletop) to avoid false positives.

Fourth, success/failure labeling: a force-torque sensor at the wrist detects whether the object remained in the hand after the 3-second hold. If the measured load matches the object weight (from a database), the trial is labeled “success”; otherwise “failure”. This eliminates the need for human classification.

Fifth, reset: if unattempted candidates remain for the current object pose, the robot places the object back and restarts. Otherwise, the active reset module (a second robot arm or a gravity-based reorienter) tilts or pushes the object to a new stable pose, then re-estimates the pose and continues. Each trial record—video, poses, candidate parameters, label—is saved to the database.

The composition of the 48.2 s loop breaks down as: robot execution (24.8 s), retract motion (11.9 s), perception (7.8 s), and motion planning (3.8 s). Perception is the only step that could be accelerated (e.g., with faster pose estimators), but execution motion remains the dominant bottleneck.

Distribution of grasp execution durations across 500 trials showing majority between 2-6 seconds

Why This Matters for Robotics

Dexterous grasping is a prerequisite for robots that handle arbitrary objects in homes, warehouses, and factories. But training robust policies requires massive amounts of real-world data—data that teleoperation is too slow to produce. AutoDex demonstrates that fully automated data collection is not only possible but practical: a system can run overnight, collecting thousands of labeled trials without a human in the loop.

This has direct implications for companies deploying used cobots for sale or humanoid robots on BotMarket. The AutoDex approach means that grasping datasets can be curated at speeds approaching 75 trials per hour, enabling downstream tasks like imitation learning or reinforcement learning to train on hundreds of thousands of real-world grasp attempts. The database itself becomes a reusable asset: queries for “successful grasp on a cylindrical object with the Allegro hand” can be answered instantly, and then re-checked for feasibility in a new scene.

Moreover, the system’s safety monitor and automatic reset make it suitable for industrial deployment where human supervision is costly. Factories that need to automate pick-and-place of varied items can adapt AutoDex’s pipeline to their specific arm–hand combinations and object sets.

Limitations and Open Questions

AutoDex currently only collects stable power grasps on a fixed workcell. It does not handle bimanual coordination, mobile manipulation, finger-rolling regrasps, or functional grasps like tool use and handover—all critical for more advanced tasks. The system also inherits the blind spots of its grasp generator: if the generator cannot propose a feasible candidate for a given object (e.g., requiring dynamic finger motion during contact), AutoDex will never test it. Additionally, the high camera density (20 cameras) makes the workcell bulky and expensive, though the paper notes that only 10–12 cameras are actually needed in practice.

Finally, the success/failure labeling only checks lift-and-hold, not functional success (e.g., can the grasp be used to pour or insert). Extending to task-conditional labeling remains open.

Frequently Asked Questions

How does AutoDex label a grasp as success or failure? It uses a force-torque sensor at the wrist during a 5 cm lift and 3-second hold. If the measured load matches the known object weight, the trial is labeled success; otherwise failure.

Does AutoDex require any human supervision during collection? No—it runs fully unattended. The safety monitor aborts unsafe motions, and the active reset module reorients objects without human help.

What robot hands are supported? The paper demonstrates AutoDex with the Allegro Hand and Inspire Hand, both 4-finger dexterous hands. The architecture is hand-agnostic as long as the robot arm can plan collision-free trajectories.

How many trials did AutoDex collect in total? The database contains 3,593 physically executed and automatically labeled trials across 100 household objects, covering diverse geometries and materials.

Conclusion

AutoDex proves that dexterous grasp data collection can be fully automated at practical throughput. By integrating dense perception, collision-safe execution, physical labeling, and automatic reset, it quadruples the rate of teleoperation while eliminating human fatigue. The result is a scalable path to building the large-scale real-world datasets that dexterous manipulation requires.

🍪 Préférences des cookies

Nous utilisons des cookies pour mesurer les performances. Politique de confidentialité