Robot perception with minimal human supervision
Vision is a key ability to humans as well as to robots to understand and extract information from real world and is crucial to safely interact with our environments. In contrast to human perception, state-of-the-art machine vision methods require millions of images and their manual labels to learn each visual task. This project focuses on designing machine learning and computer vision techniques that can help robots to learn multiple tasks from limited labelled data.
There are two main directions for potential projects:
Project 1: It will focus on object pose prediction and detection (orientation and location) from images and videos. Getting manual annotations for each object instance in a scene is expensive and, in some cases, not possible at all. Thus, we will apply unsupervised and self-supervised learning deep learning strategies to object pose estimation  and object detection .
Project 2: To safely interact with people, a robot needs to accurately predict human actions and understand the social context. This project will focus on predicting human actions from videos by using motion and depth information with deep neural networks .
 Jakab, T., Gupta, A., Bilen, H., & Vedaldi, A. (2018). Unsupervised Learning of Object Landmarks through Conditional Image Generation. In NeurIPS.
 Bilen, H., & Vedaldi, A. (2016). Weakly Supervised Deep Detection Networks. In CVPR.
 Bilen, H., Fernando, B., Gavves, E., & Vedaldi, A. (2017). Action Recognition with Dynamic Image Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence.