Robust 3D Point Cloud Feature Classifier for Grounded Language Modelling
Human-robot interaction via speech enhances a robot’s ability to collaborate naturally and seamlessly with humans on joint tasks, e.g. by joint goal setting, communicating progress or clarifying the user’s intention. To achieve this ability of natural command and control in real world scenarios would require the construction of grounded language models supported by spatial modelling and reasoning, that can link a detailed digital landscape to real world concepts via language.
Point clouds captured using LiDAR or Structure from Motion techniques are able to represent complex physical features (e.g. buildings) and sub-parts (e.g. pillars, doors). Through georeferencing points clouds to survey grade maps it is possible to annotate regions with appropriate labels (e.g. museum, park) but to construct useful referring expressions from a user’s perspective (e.g. the house with a veranda) requires a level of detail not available within existing spatial datasets.
This Ph.D will develop new A.I. classifiers which use 3D point clouds in conjunction with semantic building models and spatial datasets, to recognise building features and objects (e.g. domes, doors, pillars, windows, desks, ) enabling more natural human-robot interactions.
While 2D images can be classified using computer vision the advantage of operating directly with 3D point clouds is that descriptions can be generated from multiple observer locations using visibility modelling, thereby a robot could generate a natural language description suitable for a human collaborator (e.g. the door on your left).
Autonomous robots that can communicate about their context and local environment to this fidelity would be able to support humans in range of roles from rapid delivery of medical equipment via drones, to autonomous cars, to disaster emergency response.