We are pleased to have Dr Robert Loftin from Microsoft Research Cambridge give this year's RL Research Lecture in the RL course. Everyone is welcome!
Meeting link (via MS Teams): https://teams.microsoft.com/l/meetup-join/19%3a75aa03fba655431e812e9cc6c30255ec%40thread.tacv2/1614774638928?context=%7b%22Tid%22%3a%222e9f06b0-1669-4589-8789-10a06934dc61%22%2c%22Oid%22%3a%22a2103897-d3c3-4c6c-a0e8-82991615c70f%22%7d
In this talk I will discuss our lab's recent work on the problem of efficient exploration in multi-agent reinforcement learning, with applications to commercial video games. High sample complexity remains as one of the main barriers to the widespread adoption of RL as a solution to real-world problems. Exploration mechanisms based on the principle of "optimism under uncertainty" have been shown to dramatically improve sample efficiency challenging single-agent RL tasks, and are well-motivated by theoretical results on the complexity of reinforcement learning in finite MDPs.
In our work we examine the theoretical underpinnings of optimistic exploration in multi-agent settings. We show that the straightforward application of optimism in competitive (zero-sum) games can waste time exploring parts of the state space that are irrelevant to strategic play. To address this issue, we present a novel "strategically efficient" exploration mechanism that focuses exploration on the most strategically relevant states and actions. We conclude with a discussion of the application of strategically efficient exploration to the problem of training AI controllers for AAA video games.