OmniShape: Zero-Shot Multi-Hypothesis Object Pose and Shape Estimation in the Real World

TL;DR: OmniShape estimates object pose and full shape via two conditional diffusion models, enabling multi-hypothesis sampling from the joint distribution without assuming known 3D models.

Abstract

We would like to estimate the pose and full shape of an object from a single observation, without assuming known 3D model or category. In this work, we propose OmniShape, the first method of its kind to enable probabilistic pose and shape estimation. OmniShape is based on the key insight that shape completion can be decoupled into two multi-modal distributions: one capturing how measurements project into a normalized object reference frame defined by the dataset and the other modelling a prior over object geometries represented as triplanar neural fields. By training separate conditional diffusion models for these two distributions, we enable sampling multiple hypotheses from the joint pose and shape distribution. OmniShape demonstrates compelling performance on challenging real world datasets.

Method

OmniShape decouples shape completion into two generative stages. The first (a) maps an RGB (with optional normals from a depth image) to a partial pointcloud with corresponding normals in a NORF. The second (b) conditions on the partial observation in the NORF and predicts the complete object geometry (shown here with CFG), represented as a triplane. Both stages are modeled with diffusion models to produce multiple hypotheses. (c) Given additional information such as a depth image, the object can be registered into the scene using the first stage output. A reduced number of illustrative normals in the NORF pointcloud and channels in the Ortho-NORF conditioning drawn for visual clarity.

Multi-Hypothesis Outputs

By iteratively sampling partial pointclouds and shape completions, OmniShape can predict multiple hypotheses for pose and shape. Input images originally from Pix3D and Ocrtoc3D, provided by ZeroShape.

Zero-Shot Shape and Pose Estimation

Using open-world detections and segmentations, OmniShape can perform zero-shot, multi-hypothesis shape and pose estimation for objects in the real world. Input image below taken from the TUM dataset.

Citation