Learning whole body motor skills for autonomous and agile recovery of locomotion failures

Autonomous failure recovery of robot locomotion using deep learning for acquiring agile whole body motor skills via massive parallelization, data-efficient learning.
Description of the Project: 


Reinforcement learning allows robots to learn tasks for which more classical control algorithms do not provide solutions. In recent years, the use of deep neural networks within reinforcement learning has made large strides, allowing robot models to learn tasks such as walking. However, there are many challenges left. One key challenge is exploration: how should the algorithm decide which actions to try? This is particularly hard for continuous action spaces, and thus for torque controlled robots. Improving exploration will allow faster learning, and learning for more complex tasks.

For discrete action spaces, exploration can be improved by estimating how uncertain the robot is about the value of an action. Exploration can then be guided via the principle of “optimism in face of uncertainty”. A technique called Randomized Prior Functions [1] implements this approach. Another technique, called Normalized Advantage Functions [2], could be used to bring Randomized Prior Functions into continuous action spaces.

Project description

Advances in hardware and control of legged robotics (humanoids and quadrupeds) have made them much more reliable and physically capable in recent years. However, unexpected disturbances can still cause failures (falls) during complex robot locomotion in challenging environment which are unsafe for humans to intervene. To truly enable autonomous deployment of robots and warrant safe operation for people without risky intervention, more research needs to be studied for securing fall back solutions in industrial sites, eg developing new learning algorithms that are able to learn, adapt, and evolve new strategies to successfully recover from such failures.

The project goal is to achieve autonomous fall-back solutions in the above real-world deployments, and the research therefore aims to develop novel learning algorithms to tackle such failure recovery in situations where the robot has failed, landed on an undesirable posture, and has to return to its nominal operation mode/pose. Particularly, the project will focus on the challenging problems that requires real time planning and control actions in unseen failure scenes (falls on rubble, construction site, ruins), which can’t be simply solved by unconventional planning and control approaches.

This project will strongly involve various Reinforcement Learning techniques, such as the techniques mentioned above ([1] and [2]), and will investigate the efficacy of that combination or state-of-the-art off-policy DRL algorithms (soft-actor critic, D4PG). The simulation and experimental validation will be carried out on the existing robotic facilities at Edinburgh Centre for Robotics, ie the quadruped robot ANYmal [3]. Initial progress will start with quasi-static manoeuvres to recover from prone posture on the ground, which will require candidates to have strong ready expertise in planning, state-estimation and low-level controls.

To increase the training process, simulation tools will be developed to allow massive parallel computing, and thus to enable data-demanding and sample-inefficient algorithms. It will collect large amount of experience through massive parallelization, and then leverage Experience Replay (on- and off-policy) for fast (few-shot) task adaption.

At the final stage of the project, research effort will expand towards more complex and unprecedented whole body motor skills [4] - fall recoveries of the ANYmal robot, such as using arms to recovery from prone position, balancing the robot when performing manipulation tasks, adaptation and coordination of a variety of gaits (crawling, trotting, bounding, galloping, etc). All this will lead to a highly reliable platform meeting industrial standards of health and safety operations. 

Resources required: 
Anymal robot, high-end Linux PC with GPU and CPU for deep learning.
Project number: 
First Supervisor: 
University of Edinburgh
Second Supervisor(s): 
First supervisor university: 
University of Edinburgh
Essential skills and knowledge: 
Linux, C++, Python, machine learning knowledge, ROS, tensorflow, experience with Physics engines/simulators
Desirable skills and knowledge: 
PyTorch, experience with Physics engines (ODE, Bullet, Mujoco, Physx3.4) and Physics simulators (Gazebo, Pybullet, VRep, Unity)

[1] Osband, Ian, John Aslanides, and Albin Cassirer. "Randomized Prior Functions for Deep Reinforcement Learning." arXiv preprint arXiv:1806.03335 (2018).

[2] Gu, Shixiang, et al. "Continuous deep q-learning with model-based acceleration." International Conference on Machine Learning. 2016.

[3] ANYmal robot: https://www.edinburgh-robotics.org/equipment/robotarium-east-field-syste...

Chuanyu Yang, Kai Yuan, Wolfgang Xaver Merkt, Sethu Vijayakumar, Taku Komura, Zhibin Li, “Learning Whole-body Motor Skills for Humanoids,” in IEEE-RAS International Conference on Humanoid Robots, 2018.