Alessandro Suglia

Research project title: 
Interactive Grounded Language Learning
Principal goal for project: 
Implementation of machine learning techniques that can be used to teach machines language representations that are grounded in other modalities such as vision
Research project: 

Natural Language (NL) is by far the easiest and most powerful communication device we possess, so it is reasonable to require an intelligent machine to be able to communicate through language [1]. Since the early days of Artificial Intelligence, researchers have tried to design systems able to communicate with humans, by relying on complex hand-engineered features (i.e., dialogue acts and slots). Although these handcrafted approaches can be effective for specific domains, it is clear that this kind of supervision will not allow these systems to scale due to the high variability of Natural Language utterances and the high cost and effort of data annotation for each new application. In recent years, Deep Learning techniques have demonstrated their effectiveness in different challenging games such as Go [2] and Poker [3] as well as several Natural Language Processing tasks such as Question Answering [4]. However, current models of natural language representation are blind because they are able to learn language meanings by looking at text only without relying on other modalities such as vision [5].

Language learning is an activity that is situated in an environment in which the learner does not learn in isolation. It interacts with other agents as well as with the environment. According to Wittgenstein [6], humans constantly play language games in which new meanings may arise which are useful to solve specific goals. The aim of this research project is to define a curriculum of situated language games to allow the agent to learn word meanings grounded in perceptual experience. Starting from the seminal work of the Talking Heads experiments [7], we are focusing on situated games such as guessing games as well as action execution games. The former, focus on learning to guess a hidden object in a scene by asking visually grounded questions. The latter requires the agent to understand and execute natural language instructions required to solve a more complex task.

The most relevant research outcomes of this project are reported as follows and a more updated list can be found on my Google Scholar:

  1. "CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning", Association for Computational Linguistics (ACL) 2020
  2. "Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games", International Conference on Computational Linguistics (COLING) 2020


  1. Mikolov, Tomas, Armand Joulin, and Marco Baroni. "A roadmap towards machine intelligence." arXiv preprint arXiv:1511.08130 (2015).
  2. Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.
  3. Moravčík, Matej, et al. "Deepstack: Expert-level artificial intelligence in no-limit poker." arXiv preprint arXiv:1701.01724 (2017).
  4. Devlin, Jacob, et al. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
  5. Bisk, Yonatan, et al. "Experience grounds language." arXiv preprint arXiv:2004.10151 (2020).
  6. Wittgenstein, Ludwig, and G. E. M. Anscombe. "Philosophical investigations." (1953).
  7. Steels, Luc. The Talking Heads experiment: Origins of words and meanings. Vol. 1. Language Science Press, 2015.
About me: 

I am a first year PhD student under the supervision of Prof. Oliver Lemon at the CDT of “Robotics and Autonomous Systems” at the Edinburgh Centre for Robotics. I am mainly interested in developing Deep Learning models for conversational agents that are able to continuously learn by interacting with the user and by exploiting features coming from the environment in which they are deployed. I received an MSc degree in "Knowledge Engineering and Machine Intelligence" from the University of Bari and before starting my PhD I was an NLP consultant for Plusimple, a health care startup. I was mainly responsible for developing a search engine able to retrieve personalised contents for the user as well as for developing an analytics platform aimed at extracting knowledge from raw data that were extracted from users sessions. I am currently a member of the Heriot-Watt team for the Alexa Prize Challenge. Follow our bot on Twitter and on our blog.