ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids

Jan 1, 1010·

Dinesh Jayaraman

Ruohan Gao

Kristen Grauman

· 0 min read

PDF Cite arXiv

Abstract

We introduce an unsupervised feature learning approach that embeds 3D shape information into a single-view image representation. The main idea is a self-supervised training objective that, given only a single 2D image, requires all unseen views of the object to be predictable from learned features. We implement this idea as an encoder-decoder convolutional neural network. The network maps an input image of an unknown category and unknown viewpoint to a latent space, from which a deconvolutional decoder can best “lift” the image to its complete viewgrid showing the object from all viewing angles. Our class-agnostic training procedure encourages the representation to capture fundamental shape primitives and semantic regularities in a data-driven manner—without manual semantic labels. Our results on two widely-used shape datasets show 1) our approach successfully learns to perform “mental rotation” even for objects unseen during training, and 2) the learned latent space is a powerful representation for object recognition, outperforming several existing unsupervised feature learning methods.

Type

Publication

In ECCV

Last updated on Jan 1, 1010

Unsupervised Features 3D Reconstruction Observation Completion Self-Supervised Learning

← More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch Jan 1, 1010

Techniques for Rectification of Camera Arrays Jan 1, 1010 →