Perceiving Humans in Detail: Fine-grained Classification of Human Action Recognition
Human action recognition is a fundamental problem that underlies many applications in robotics, including interaction, home care, collaboration, etc. The actions that can be recognized by robots or computers today are often coarse and simplistic, in the sense that they are very different from each other; for example eating vs playing piano, or sitting vs standing. Both datasets and technology tend to be broad and crude. As human-robot-interaction becomes more natural we require more sophisticated technology for perceiving humans. This means being able to detect subtle anomalies in behaviour, recognizing differences in emotions, understanding levels of skill in an action, or even predicting actions beyond what humans can predict.
In this project our goal is to push the boundaries of video understanding technology, by exploring the problems that require fine-grained recognition. For this we will leverage and develop novel deep learning methods, capable of sophisticated understanding of human action in videos. These methods will include modelling temporal information with recurrent networks; developing mechanisms of attention, to organically discover the most discriminative features of human motions and expressions; and few-shot learning, which is at the forefront of vision intelligence today.
We expect that this project will advance the state-of-the-art of human action recognition which constitutes the basis of many robotics applications. At the same time, we expect the advances we make in deep learning for video understanding to be fundamental contributions to computer vision, and relevant to other video understanding applications.