Deep Learning of Object Shape from Video
The shape and 3D structure of the world provides us with rich signal that enables us to interact with objects and to navigate in novel and dynamic environments. Despite the importance of this information to human visual reasoning it still remains largely underutilized in modern deep learning based semantic image understanding pipelines commonly used in robotics. For example, current best performing approaches for object classification in images are predominantly based on heavily supervised feedforward convolutional neural networks. These methods rely on texture cues and often fail to make use of shape information that is shared across related categories.
Recently we have begun to observe new types of self-supervised deep networks that explicitly reason about scene depth [1,2]. These methods use video during training time to parameterize deep models that can reason about the shape of the world without requiring any explicit supervision. However, these methods make strong assumptions about the scenes they are observing (e.g. a static world viewed from a moving camera). These assumptions are limiting if one wants to reason about dynamic object categories.
The goal of this project is to develop novel deep networks that can reason about object shape using only video as supervision. By learning representations that disentangle shape and appearance the developed models should be able to perform fine-grained object classification  with significantly fewer training examples.
 C. Godard, O. Mac Aodha, G. Brostow, Unsupervised Monocular Depth Estimation with Left-Right Consistency, CVPR 2017
 C. Godard, O. Mac Aodha, M. Firman, G. Brostow, Digging Into Self-Supervised Monocular Depth Estimation, ICCV 2019
 G. Van Horn, O. Mac Aodha, et al., The iNaturalist Species Classification and Detection Dataset, CVPR 2018