Learning Transferable Representations of Object Pose from Images

Learning how to extract the pose of unseen object categories from images
Description of the Project: 

For an autonomous robotic system to successfully, and safely, interact with the world around it it needs to be able to reason about the objects that it encounters not just as collections of pixels but as higher level semantic concepts. Furthermore, it must also determine the precise location and 3D spatial configuration of these objects relative to the robot. For example, it is vitally important that such a system can correctly identify any humans or animals that may be nearby and also infer their poses i.e. the spatial configuration of the bodyparts of the objects.  

Advances in computer vision have resulted in powerful deep learning based approaches that are capable of accurately detecting specific object categories like humans and can extract their poses from images [1]. However, these systems are trained with vast quantities of supervised data and by default cannot be easily adapted to other object categories e.g. animals [2]. The central questions that we will address in this project are: (i) can we learn how to extract pose for novel object categories that we have not seen during training, (ii) can we do this with limited supervision, and (iii) can we infer the 3D pose of these objects given only limited 2D information [3]. 

Resources required: 
Workstation with GPUs
Project number: 
240017
First Supervisor: 
University: 
University of Edinburgh
First supervisor university: 
University of Edinburgh
Essential skills and knowledge: 
Knowledge of machine learning, computer vision, and probabilistic reasoning. Programming e.g. Python and deep learning frameworks e.g. PyTorch or TensorFlow.
Desirable skills and knowledge: 
Theoretical and practical skills in deep learning
References: 

[1] R. Alp Gueler, N. Neverova, Natalia, I. Kokkinos, Densepose: Dense human pose estimation in the wild, CVPR 2018 

[2] Biggs et al. Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop, ECCV 2020 

[3] Ronchi, Mac Aodha, et al. It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data, BMVC 2018 

[4] Godard, Mac Aodha, Firman, Brostow, Digging Into Self-Supervised Monocular Depth Estimation, ICCV 2019