DexMachina: Functional Retargeting for
Bimanual Dexterous Manipulation
Mandi Zhao1* Yifan Hou1 Dieter Fox2 Yashraj Narang2 Ajay Mandlekar2† Shuran Song1†
1 Stanford University  
2 NVIDIA  
* Work partially done at internship  
† Equal advising
     


arXiv
  Code
Abstract
We study the problem of functional retargeting: learning dexterous manipulation policies to track object states from human hand-object demonstrations. We focus on long-horizon, bimanual tasks with articulated objects, which is challenging due to large action space, spatiotemporal discontinuities, and embodiment gap between human and robot hands. We propose DexMachina, a novel curriculum-based algorithm: the key idea is to use virtual object controllers with decaying strength: an object is first driven automatically towards its target states, such that the policy can gradually learn to take over under motion and contact guidance. We release a simulation benchmark with a diverse set of tasks and dexterous hands, and show that DexMachina significantly outperforms baseline methods. Our algorithm and benchmark enable a functional comparison for hardware designs, and we present key findings informed by quantitative and qualitative results. With the recent surge in dexterous hand development, we hope this work will provide a useful platform for identifying desirable hardware capabilities and lower the barrier for contributing to future research.

Long Naratted Video
Method
Overview
We propose DexMachina, a novel algorithm that achieves functional retargeting for a variety of hands and objects. At a high-level, we train an RL policy using a virtual object controller curriculum, guided by both a task reward and auxiliary rewards.

Task and Auxiliary Rewards
Given one demonstration, we first use its object states to define task reward. Next, we run a collision-aware kinematic retargeting procedure, which produces reference dexterous hand motions: we use them for motion imitation reward and residual wrist actions. We then approximate hand-object contact positions, which we use to define contact reward.
Virtual Object Controller Curriculum
The reward terms and residual action learning can sometimes achieve short and simple tasks, but struggle on long-horizon clips with complex contacts, where the policy often experiences catastrophic early failures. This motivates us to propose a novel curriculum approach, to let the policy explore different strategies in a less fragile setting.
Experiment Results
Experiment Setup
To evaluate dexmachina, we use a subset of ARCTIC data which includes 5 articulated objects and 7 clips consisting of diverse motion sequences and both long and short-horizon demonstrations. We curate assets for 6 open-source dexterous robot hand models, with varying sizes and kinematic designs.
DexMachina outperforms baseline methods across all tasks and hands
We first evaluate on 4 hands. We compare dexmachina with direct replay of kinematic retargeting results, two baseline methods, and training with our proposed rewards but without curriculum. With rare exceptions, our method consistently achieves the best performance across all the hands and tasks, especially on long-horizon tasks with complex motion sequences.

Kinematic retargeting does not produce feasible actions
Without policy learning, kinematic retargeting can produce human-like hand motions, but when we play the retargeting results in simulation, they are not feasible for completing the task.
DexMachina handles long task horizons without early failures
Compared to ObjDex baseline which uses only task reward, our method can handle longer horizon tasks without early failures.
DexMachina allows policies to adapt to their hardware constraints
For example, on the Notebook task, the XHand policy follows the human demonstrator to use the left hand to hold up the object and the right hand to close the cover, But for the smaller, less-actuated Inspire Hand, the policy learns to use both hands to stabilize the object and close the cover, despite using the same human hand motion reference as Xhand. Similarly, for the Mixer task, Allegro Hand uses its longer and more flexible thumb to close the mixer lid, but Schunk Hand learns to move the wrist forward and close with its palm.

DexMachina enables a functional comparison between hardware designs
With an effective functional retargeting algorithm, we can now perform a functional comparison between different dexterous hands. We focus on the 4 long-horizon tasks, and evaluate dexMachina on two additional hands. We compare the performance of all the hands and discuss key empirical findings. Overall, we see the larger, fully-actuated hands can achieve both higher final performance and better learning efficiency. Interestingly, degrees of freedom is more important than hand sizes. Schunk hand and Xhand, for example, have more actuated fingers, and perform much better than Inspire hand and ability hand which are similar in size to human hands With the recent surge in dexterous hand development, we hope our work will provide a useful platform, that can help identify desirable hardware capabilities and lower the contribution barrier for future research.

Additional Policy Evaluation Videos
Using our method on the 3 short-horizon tasks, all hands can achieve between 70 to up to 90% success rate in object tracking AUC-ADD metric.
Acknowledgements
This work was supported in part by NVIDIA, NSF Award #2143601, #2037101, and #2132519. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors. The authors would like to thank current and former colleagues at NVIDIA: Kelly Guo, Milad Raksha, David Hoeller, Bingjie Tang for their help with physics simulation environments and insightful discussions during algorithm development; and all the members of REALab at Stanford University for providing useful feedback on initial drafts of the paper manuscript.