Meta-learning of multiple motor control skills for autonomous robots
Advances in the hardware of ground robots (humanoids, quadrupeds, and tracked/wheeled robots) have made them more reliable and physically capable in recent years. However, complex and unexpected situations are still the hurdle to mission success, due to lack of versatile and autonomous skills to adapt to unknown environment, particularly when it comes to some failures (falls) in outdoor environments. To truly enable autonomous deployment of these robots, more research needs to be studied for developing new learning algorithms that are able to learn, adapt, and evolve new strategies to recover from such failures and be resilient for executing tasks.
Deep reinforcement learning allows robots to learn solutions for complicated tasks that are unsolvable by classical control approaches. Recently, deep reinforcement learning has made impressive progress, such as solving dynamic locomotion and various fall recoveries. However, the challenges remain because the network design and training process still require specific human knowledge and experience. Therefore, one interesting and futuristic research is meta-learning that knows how to learn.
The project goal is to research on learning algorithms that are able to learn invariant features across different skills/domains, generalise policies for skill-transfer, and know how to learn new motor skills. The expected autonomous performance is to tackle a wide range of challenging failures in which the robot is in unseen failure scenes (falls on rubble, construction site, ruins), and is able to return to its nominal operation mode. Particularly, the project will focus on the problems that traditional planning and control actions are not able to solve in real time, so we can benchmark the learning performance of autonomous fall-back solutions in the above real-world deployments.
Apart from meta-learning, this project will also involve various learning techniques, such as the techniques mentioned above ( and ), and will investigate the efficacy of that combination or state-of-the-art off-policy DRL algorithms (soft-actor critic, D4PG). To increase the training speed in simulation, the candidate is expected to leverage the use of our existing massive parallel computing to collect large amount of experience, and to leverage Experience Replay (on- and off-policy) for fast (few-shot) task adaption.
At the mature stage of the project, research effort will expand towards more complex and unprecedented whole body motor skills  - whole body loco-manipulation: using whole body contact for fall recoveries from locomotion failures, maintaining balance while performing manipulation tasks, adaptation and coordination of a variety of gaits (crawling, trotting, bounding, galloping, etc). Successful research outcomes will be validated on the real robotic facilities at Edinburgh Centre for Robotics .
- Osband, Ian, John Aslanides, and Albin Cassirer. "Randomized Prior Functions for Deep Reinforcement Learning." arXiv preprint arXiv:1806.03335 (2018).
- Gu, Shixiang, et al. "Continuous deep q-learning with model-based acceleration." International Conference on Machine Learning. 2016.
- ANYmal robot: https://www.edinburgh-robotics.org/equipment/robotarium-east-field-syste...
- Chuanyu Yang, Kai Yuan, Wolfgang Xaver Merkt, Sethu Vijayakumar, Taku Komura, Zhibin Li, “Learning Whole-body Motor Skills for Humanoids,” in IEEE-RAS International Conference on Humanoid Robots, 2018.