Unsupervised Learning of Objects in Motion
Objects play a central role in the behaviour of intelligent systems like robots. This makes semantic object segmentation a fundamental basic component of many applications. State-of-the-art object segmentation is often done training expensive networks with lots of labelled images. This means that new knowledge is expensive to acquire, since the network needs lots of labelled images of each new object. In addition, these networks take as input only single images, ignoring all the useful information from the motion of objects.
In this project our goal is to make object segmentation work on video and on new objects. By including and leveraging motion in the segmentation problem we use additional constraints, that can help alleviate the need from labelled data. The motion may come from passive videos (where the camera cannot move), from simulated environments (like Gibson-env) or even from real robot interaction.
The underlying learning needs to incorporate so-called few-shot learning, to avoid the current need for extensive labelled data. This means that our system needs to be able to create representations of objects general enough so that new concepts can be easily learned. Another necessary component will be reinforcement learning, since disambiguation of object category and object geometry often requires ego movement and basic interaction.