Dr Marian Andrecki | Edinburgh Centre for Robotics

Research project title:

Learning predictive models from observations

Research project:

Recent years have seen significant improvements in model-free reinforcement learning (RL). This is especially apparent in simulated domains: game playing at human level or learning locomotion from scratch. Unfortunately, these advances did not translate into major breakthroughs for robotics. It is often argued that the key reason for that is the inability to provide enough experience for the data-hungry RL methods. Agents acting in the real-world are expensive to run, susceptible to damage and operate in more complex environments.

This research aims to advance current RL by enabling model-based techniques. The approach is to use unsupervised learning to obtain time predictive models for high dimensional sensory percepts. With such models the agent: (a) can predict results of different actions, (b) is able to determine which actions are particularly unsafe and should not be explored, (c) has a suitable representation of an environment for later value-learning. These capabilities promise that the agent will have to explore less in order to complete the learning (because it uses data more efficiently) and when it does explore it can avoid risky experiments.

Currently, I explore deep learning architectures to understand their limitations, scalability, biases, as well as similarities to established state estimation methods, such as particle filters.

An example of neural predictions can be seen below. The underlying environment is a ball rolling with near-constant velocity. Whenever it hits a wall there is a 50-50 chance it will bounce from it or emerge on the other side of the image. This behaviour can be viewed in the 2nd image -- ground truth (GT). A predictive auto-encoder (PAE) observes a handful of initial frames of the ball behaviour followed by uninformative darkness -- this is shown in the leftmost image below, observation (Ob). The PAE's output is an attempt at reconstruction of the ground truth given the observations -- 3rd image (AE). The rightmost image (PF) is a result of making a prediction with a particle filter which has access to a complete model of the environment (i.e. the simulator that generated the data) and observations.

About me:

Machine learning PhD student exploring agents which build predictive models of reality to inform future decision-making. More specifically, I research how to understand what the agent has learnt, what are its limits of learning and whether its knowledge can be expressed in form that is understandable to us.

Solving these problems will allow us to have greater trust in deep learning architectures.

Currently, I work with predictive autoencoders in Keras and TensorFlow (Python). My undergraduate background is Electrical Engineering with dissertation in probabilistic state estimation - e.g. tracking positions of multiple targets given noisy sensor data.

Email:

marian.andrecki@gmail.com

Supervisor:

Prof. Nick Taylor

Prof. Subramanian Ramamoorthy

Student type:

Alumni

LinkedIn:

https://www.linkedin.com/in/marian-andrecki-4b5a045b