Grasping Diverse Objects with Simulated Humanoids



Abstract

We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. To close this gap, we learn a controller that can pick up a large number (>1200) of objects and carry them to follow randomly generated trajectories. Our key insight is to leverage a humanoid motion representation that provides human-like motor skills and significantly speeds up training. Using only simplistic reward, state, and object representations, our method shows favorable scalability on diverse object and trajectories. At test time, we only require the object mesh and desired trajectories for grasping and transporting. To demonstrate the capabilities of our method, we show state-of-the-art success rates (94.5%) in completing object trajectories and generalizing to unseen objects.



  1. Omnigrasp
  2. GRAB
  3. OakInk
  4. OMOMO
  5. Robustness
  6. Training Without PULSE-X
  7. PHC-X and PULSE-X
  8. Failure Cases


Omnigrasp

We use our method to trace out "Omnigrasp" in cursive. All objects and trajectories shown here are testing ones not seen in training. Green dots are the desired trajectory.

Grab

In this section, we visualize Omnigrasp's performance on the GRAB dataset following MoCap Trajectories. Here, we use the policy trained on only synthetic trajectories and no MoCap trajectories are used for training.

Picking up all 45 training objects from GRAB successfully using diverse grasp strategies. Our method can lift up the objects and carry them to follow the MoCap trajectories.

Unseen Objects (Goal Test Set)

We test on 5 unseen objects from GRAB (apple, camera, toothpaste, binoculars, mug) as the cross-object test. Green dots are the desired future trajectory. We can see that our policy can successfuly pick up objects and follow trajectories. Interestingly, it learns to use the other hand to stabilize the object in-air during the trajectory following phase.

Apple
Camera
Toothpaste
Binocular
Mug

Unseen Subjects (IMoS Test Set)

Here, we test on seen object using unseen trajectories and initial object pose during training. As we train on synthetic trajectories, all object trajectories are unseen during training.

Cylinder: small, medium, large
Cube: small, medium, large
Torus: small, medium, large
Bowl
Scissors
Flute

Oakink

Here we visualize grasping objects from the Oakink dataset (1330 objects for training and 185 for testing, 32 categories). Since Oakink contains no MoCap trajectories, we showcase randomly generated trajectories here.

Testing Objects
Training Objects

OMOMO

OMOMO dataset contains much larger objects than GRAB and Oakink, while having a small number of objects. We show results on the training objects as a proof of concept that our method can support larger objects.

Robustness

Omnigrasp is robust to object initial position, orientation, and weight. For more extreme cases, such as objects that are further away, modification to the training setup (e.g. initialize the object further away from the humanoid during training) is needed, and our framework natively supports it.

Random initial object orientation and height.
Random initial object orientation and height, more extreme.
Random Humanoid Initial Distance to Object
Robustness to different weight of the apple. From left to right, weight grows exponentially from 0.2x, 0.5x, 1.4x, 4.0x, 10.9x, 29.6x, 80.6x, 219.3x. Our humanoid can pick up the apple until 30x the weight, though struggling immensely. We also notice that the humanoid is more likely to use both hands for holding when the apple is heavier.
Robustness to different directions (left, up, down, forward, backward, right).

Training Without PULSE-X

When trained without PULSE-X, the humanoid can still pick up objects, but does not achieve high successrate and uses unnatural motion.

PHC-X and PULSE-X

Motion Imitation

Here, we visualize motion imitation results on PULSE-X and PHC-X.

PULSE-X
PHC-X

Random Motion Generation

Here, we test the random motion generation capability of PULSE-X by using the prior to randomly sample latent code and motion.

Dexterous AMASS

In this section, we showcase our dexterous AMASS dataset, and PULSE-X's imitation performance on it. We can see that PULSE-X captures the fine-grained finger motion, which means that it contains motor skills for these motion.

Dexterous AMASS Dataset Render
PULSE-X's Imitation Result on Dexterous AMASS.

Failure Cases

Failure cases can happen due to dropping the object, the trajectory being too fast, or due to not being able to pick it up.

Failure Due to Dropping
Failure Due to Trajectory Too Fast
Failure Due to Grasping