DexMachina: Functional Retargeting for
Bimanual Dexterous Manipulation

Annoymized Codebase

Abstract

We study the problem of functional retargeting: learning dexterous manipulation policies to track object states from human hand-object demonstrations. We focus on long-horizon, bimanual tasks with articulated objects, which is challenging due to large action space, spatiotemporal discontinuities, and embodiment gap between human and robot hands. We propose DexMachina, a novel curriculum-based algorithm: the key idea is to use virtual object controllers with decaying strength: an object is first driven automatically towards its target states, such that the policy can gradually learn to take over under motion and contact guidance. We release a simulation benchmark with a diverse set of tasks and dexterous hands, and show that DexMachina significantly outperforms baseline methods. Our algorithm and benchmark enable a functional comparison for hardware designs, and we present key findings informed by quantitative and qualitative results. With the recent surge in dexterous hand development, we hope this work will provide a useful platform for identifying desirable hardware capabilities and lower the barrier for contributing to future research.

Method

Method Overview

We propose DexMachina, a novel algorithm that achieves functional retargeting for a variety of hands and objects. At a high-level, we train an RL policy using a virtual object controller curriculum, guided by both a task reward and auxiliary rewards.

Task and Auxiliary Rewards

Given one demonstration, we first use its object states to define task reward. Next, we run a collision-aware kinematic retargeting procedure, which produces reference dexterous hand motions: we use them for motion imitation reward and residual wrist actions. We then approximate hand-object contact positions, which we use to define contact reward.

Virtual Object Controller Curriculum

The reward terms and residual action learning can sometimes achieve short and simple tasks, but struggle on long-horizon clips with complex contacts, where the policy often experiences catastrophic early failures. This motivates us to propose a novel curriculum approach, to let the policy explore different strategies in a less fragile setting.

Experiment Results

Experiment Setup

To evaluate dexmachina, we use a subset of ARCTIC data which includes 5 articulated objects and 7 clips consisting of diverse motion sequences and both long and short-horizon demonstrations. We curate assets for 6 open-source dexterous robot hand models, with varying sizes and kinematic designs.

DexMachina outperforms baseline methods across all tasks and hands

We first evaluate on 4 hands. We compare dexmachina with direct replay of kinematic retargeting results, two baseline methods, and training with our proposed rewards but without curriculum. With rare exceptions, our method consistently achieves the best performance across all the hands and tasks, especially on long-horizon tasks with complex motion sequences.

Kinematic retargeting does not produce feasible actions

Without policy learning, kinematic retargeting can produce human-like hand motions, but when we play the retargeting results in simulation, they are not feasible for completing the task.

DexMachina handles long task horizons without early failures

Compared to ObjDex baseline which uses only task reward, our method can handle longer horizon tasks without early failures.

DexMachina allows policies to adapt to their hardware constraints

For example, on the Notebook task, the XHand policy follows the human demonstrator to use the left hand to hold up the object and the right hand to close the cover, But for the smaller, less-actuated Inspire Hand, the policy learns to use both hands to stabilize the object and close the cover, despite using the same human hand motion reference as Xhand. Similarly, for the Mixer task, Allegro Hand uses its longer and more flexible thumb to close the mixer lid, but Schunk Hand learns to move the wrist forward and close with its palm.

DexMachina enables a functional comparison between hardware designs

With an effective functional retargeting algorithm, we can now perform a functional comparison between different dexterous hands. We focus on the 4 long-horizon tasks, and evaluate dexMachina on two additional hands. We compare the performance of all the hands and discuss key empirical findings. Overall, we see the larger, fully-actuated hands can achieve both higher final performance and better learning efficiency. Interestingly, degrees of freedom is more important than hand sizes. Schunk hand and Xhand, for example, have more actuated fingers, and perform much better than Inspire hand and ability hand which are similar in size to human hands With the recent surge in dexterous hand development, we hope our work will provide a useful platform, that can help identify desirable hardware capabilities and lower the contribution barrier for future research.

Additional Policy Evaluation Videos

Using our method on the 3 short-horizon tasks, all hands can achieve between 70 to up to 90% success rate in object tracking AUC-ADD metric.

Feasibility of Arm Kinematics

We use a basic IK setup to qualitatively demonstrate the feasibility of extending DexMachina to consider robot arm kinematics. In the below videos, we use our trained RL policy to produce bimanual wrist poses for a pair of dexterous hands, then use the wrist poses as targets for inverse kinematics (IK) to generate arm joint values for a Fourier GR1 humanoid robot. Despite occasional jittering motions, we remark the learned wrist actions are generally achievable by robot arms, and a promising direction for future is extending our RL training to full arm control, or further constrain the wrist motion range to adapt to arm-specific kinematic constraints.

RL with Domain Randomization

To address the conern regarding robustness of learned finger motions, we provide additonal experiments where we randomize the physics parameters during RL training. Videos below show qualitative comparison between a policy trained without randomized physics, and another policy trained in the same setup but the object's mass is randomly varied (we add a delta mass value randomly sampled from -0.1kg to 0.5kg). We observe the policy trained with randomization (left video) learns to make more stable contact between the Inspire hand and ketchup bottle, while achieving similar object tracking accuracy as the original policy trained without randomization (right video). These results suggest incorporating domain randomization during training can improve the robustness of learned behavior, to make it less brittle to changing object dynamics, and potentially facilliate sim-to-real transfer, which is a promising direction for future work.

DexMachina: Functional Retargeting for Bimanual Dexterous Manipulation