Sabrina McCallum

Research project title: 
Learning from multi-modal signals in embodied environments
Principal goal for project: 
To enable artificial agents to learn from visual, audio and language signals and generalize across different embodied environments
Research project: 

Embodied environments are the ideal testbed for AI frameworks before they are deployed in real-world applications such as robots, Virtual and Augmented Reality, and multi-modal virtual assistants. In line with the embodiment hypothesis proposed by Cognitive Science, whereby intelligence and cognition emerge when a physical body interacts with its environment, Embodied AI research is inherently multi-modal, and agents operating in embodied environments need to be able to perceive and understand visual (‘see’) and audio signals (‘hear’) and use these abilities to act and reason. This involves encoding and learning from synthetic and/or real-world multi-modal experiences, for example by executing actions in the simulated environment, or by consuming real-world multi-media content. However, the majority of existing research in the field leverages only one (vision) or two (vision and language) modalities and is specific to the underlying simulated environment. This research seeks to address both shortcomings, by enabling artificial agents to learn from visual, audio and language signals and generalize across different embodied environments.

About me: 

Before joining the CDT RAS, I completed an MSc in Artificial Intelligence from the University of Strathclyde, Glasgow, where I graduated with Distinction and at the top of my class. My MSc dissertation combined my interests in multi-modal deep learning and affective computing. I previously worked as a Data Analyst and was involved in a range of Data Science projects for leading brands.