Analysis of Controlled Stochastic Sampling for training RL Agents for Robotics Tasks

For tasks where path-planning of real robots is guided via a simulation of a virtual agent, this project aims to understand the role and impact of the randomisation scheme on the efficiency and generalisability of the agent.
Description of the Project: 

Data-driven machine learning techniques are popularly used in the field of robotics to inform autonomous decision-making and to perform control or path-planning. Supervised learning and reinforcement learning have been shown to be particularly amenable to canonical tasks that are integral to robotics applications. However, these techniques rely on data in the form of action-label (supervised), action-value (regression) or action-reward (RL) pairs, where the action is a path (or some other) execution by a real robot. e.g. The action could be reaching of a robotic arm, moving from some starting configuration to a target configuration, and the reward could be the time taken to execute the motion. Given several training examples of an arm reaching towards the target, a potential goal could be to reach towards a previously unencountered target configuration. A major limitation of such learning is that they are slow since data collection is inherently sequential. The amount of data required grows with the complexity of the task such that this approach is prohibitively expensive even for toy-examples such as a robot arm learning to balance arbitrary objects.

A recent trend to counter this problem has been to use physical simulation to serve as a model, based on which learning is performed. Then, the learning is transferred to the robot task at hand. Naturally, the accuracy of the model is paramount to achieving effective transfer.

This project will focus on a specific instance of an agent that is trained and tested using a physics simulator (MuJoCo).  It has been shown that such agents tend to overfit to training examples and that an effective way to overcome this problem is to repeat the training by varying the seeds of the random sampler [1]. In this project, we will investigate the impact of the sampler on training efficiency and generalisation.

Resources required: 
Simulation software, testing platforms (robotic arm, e.g. UR-10), GPUs for simulation
Project number: 
123406
First Supervisor: 
University: 
University of Edinburgh
First supervisor university: 
University of Edinburgh
Essential skills and knowledge: 
Foundations of Machine Learning, Advanced programming skills: Python or C/C++, Physics (mechanics), Maths (calculus, optimisation), Data structures
Desirable skills and knowledge: 
Experience with physical simulation
References: 

[1] Zhang, A., Ballas, N. and Pineau, J., 2018. A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning. arXiv preprint arXiv:1806.07937. Vancouver