Experimentation in a reinforcement learning agent is a process by which actions are drawn and the outcome has some evaluative signal. This is not dissimilar to humans, whereby we acquire knowledge through trying actions to determine the effect they have. Where this agent has some task to accomplish, this paradigm necessitates that an agent selects actions that are both maximally interesting in terms of information gain, but also actions that reduce the prediction error, i.e., actions which it has some knowledge or understanding of. This dichotomy is often called the "exploration-exploitation dilemma". There is an inherent duality in these two problems as they both are performed over the same state-action space, and as such it is hoped that we may be able to use Information Geometry, a sub field of information theory, and use the duality present in that to help bring together a more theoretical approach to solving the exploration-exploitation dilemma.
In doing so we have developed what we believe to be a novel approach to Reinforcement Learning which we call Reflexive Reinforcement Learning, whereby an agent better uses the evaluative signals generated over the trial and error learning process. Combining this with IRL we hope will lead to an adaptive expert agent who can change its policy over time to improve the efficacy of IRL as this is an ill-posed problem.
I am a third year PhD student at the Edinburgh Centre for Robotics under the supervision of Dr Michael Herrmann, I hold a BSc in Mathematics from Queen Mary, University of London, and a MScRes in Robotics and Autonomous Systems from The University of Edinburgh. My main interests are swarm robotics, reinforcement learning, and information theory.