We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. To close this gap, we learn a controller that can pick up a large number (>1200) of objects and carry them to follow randomly generated trajectories. Our key insight is to leverage a humanoid motion representation that provides human-like motor skills and significantly speeds up training. Using only simplistic reward, state, and object representations, our method shows favorable scalability on diverse object and trajectories. At test time, we only require the object mesh and desired trajectories for grasping and transporting. To demonstrate the capabilities of our method, we show state-of-the-art success rates (94.5%) in completing object trajectories and generalizing to unseen objects.
We use our method to trace out "Omnigrasp" in cursive. All objects and trajectories shown here are testing ones not seen in training. Green dots are the desired trajectory.
Grab
In this section, we visualize Omnigrasp's performance on the GRAB dataset following MoCap Trajectories. Here, we use the policy trained on only synthetic trajectories and no MoCap trajectories are used for training.
Picking up all 45 training objects from GRAB successfully using diverse grasp strategies. Our method can lift up the objects and carry them to follow the MoCap trajectories.
Unseen Objects (Goal Test Set)
We test on 5 unseen objects from GRAB (apple, camera, toothpaste, binoculars, mug) as the cross-object test. Green dots are the desired future trajectory. We can see that our policy can successfuly pick up objects and follow trajectories.
Interestingly, it learns to use the other hand to stabilize the object in-air during the trajectory following phase.
Apple
Camera
Toothpaste
Binocular
Mug
Unseen Subjects (IMoS Test Set)
Here, we test on seen object using unseen trajectories and initial object pose during training. As we train on synthetic trajectories, all object trajectories are unseen during training.
Cylinder: small, medium, large
Cube: small, medium, large
Torus: small, medium, large
Bowl
Scissors
Flute
Oakink
Here we visualize grasping objects from the Oakink dataset (1330 objects for training and 185 for testing, 32 categories). Since Oakink contains no MoCap trajectories, we showcase randomly generated trajectories here.
Testing Objects
Training Objects
OMOMO
OMOMO dataset contains much larger objects than GRAB and Oakink, while having a small number of objects. We show results on the training objects as a proof of concept that our method can support
larger objects.
Robustness
Omnigrasp is robust to object initial position, orientation, and weight. For more extreme cases, such as objects that are further away, modification to the training setup (e.g. initialize the object further away from the humanoid during training) is needed, and
our framework natively supports it.
Random initial object orientation and height.
Random initial object orientation and height, more extreme.
Random Humanoid Initial Distance to Object
Robustness to different weight of the apple. From left to right, weight grows exponentially from 0.2x, 0.5x, 1.4x, 4.0x, 10.9x, 29.6x, 80.6x, 219.3x. Our humanoid can pick up the apple until 30x the weight, though struggling immensely. We also notice that
the humanoid is more likely to use both hands for holding when the apple is heavier.
Robustness to different directions (left, up, down, forward, backward, right).
Training Without PULSE-X
When trained without PULSE-X, the humanoid can still pick up objects, but does not achieve high successrate and uses unnatural motion.
PHC-X and PULSE-X
Motion Imitation
Here, we visualize motion imitation results on PULSE-X and PHC-X.
PULSE-X
PHC-X
Random Motion Generation
Here, we test the random motion generation capability of PULSE-X by using the prior to randomly sample latent code and motion.
Dexterous AMASS
In this section, we showcase our dexterous AMASS dataset, and PULSE-X's imitation performance on it. We can see that PULSE-X captures the fine-grained finger motion, which means that it contains motor skills for these motion.
Dexterous AMASS Dataset Render
PULSE-X's Imitation Result on Dexterous AMASS.
Failure Cases
Failure cases can happen due to dropping the object, the trajectory being too fast, or due to not being able to pick it up.