Algorithms for Multi-Agent Reinforcement Learning in Complex Environments

Develop and evaluate algorithms for multi-agent reinforcement learning in complex environments
Description of the Project: 

Multi-agent learning is an approach to solving sequential interactive decision problems, in which multiple autonomous agents learn through repeated interaction how to solve problems together. This includes agents working in a team to collaboratively accomplish tasks, as well as agents in competitive scenarios with conflicting goals. Reinforcement learning has emerged as one of the principal methodologies used in multi-agent learning, and a recent tutorial by Albrecht and Stone provides a basic introduction [1].

The core problem in multi-agent learning is that the environment is non-stationary from the perspective of individual agents: each agent learns about and adapts to the environment which includes other agents who, likewise, are continually adapting their behaviours. Several approaches have been proposed to tackle this non-stationarity, including modelling the behaviours of other agents [2], learning to communicate [3], and centralised training architectures [4]. However, non-stationarity remains a significant open challenge, which is further complicated by the need to scale to complex domains and to deal with partial observability of the environment.

Since autonomous agents may plan deliberate probing actions to elicit additional information about other agents, there is an associated risk in that such actions may inadvertently modify the behaviours of other agents in unintended ways. Thus, there is a need to safely balance such explorative actions with the risk that they entail. Furthermore, we ideally seek solutions with the ability to explain their decisions: why do the agents coordinate their actions in particular states and ways?

The goal of this project is to develop novel algorithms for highly efficient multi-agent reinforcement learning in complex environments. Examples of potential evaluation domains include competitive games (e.g. Robocup 2D/3D soccer, Starcraft 2), autonomous vehicles in dense city traffic, and autonomous wireless networks such as in DARPA's Spectrum Collaboration Challenge.

Resources required: 
High-throughput computing for simulations (which is provided through the ECDF Eddie system https://www.ed.ac.uk/information-services/research-support/research-computing/ecdf)
Project number: 
300002
First Supervisor: 
University: 
University of Edinburgh
First supervisor university: 
University of Edinburgh
Essential skills and knowledge: 
Strong programming skills; strong grasp of probability, statistics, calculus, etc.; excellent knowledge of reinforcement learning; ability to work independently
Desirable skills and knowledge: 
Knowledge of multi-agent systems and agent modelling
References: 

[1] Stefano Albrecht and Peter Stone (2017). Multiagent Learning: Foundations and Recent Trends. Tutorial at IJCAI'17 conference. http://www.cs.utexas.edu/~larg/ijcai17_tutorial

[2] Stefano Albrecht and Peter Stone (2018). Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence 258:66-95.

[3] Sainbayar Sukhbaatar and Rob Fergus (2016). Learning multiagent communication with backpropagation. In: Advances in Neural Information Processing Systems, pp. 2244-2252.

[4] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379-6390.