Immune-inspired approaches to explainable and robust deep learning models
There is an increasing demand for both robust and explainable deep learning systems in real world applications. Recently, it has been shown that malicious examples can be crafted to fool deep learning models, the so-called adversarial attacks. For example, one can fool a self-driving car to drive over the speed limit by making minor modifications on the speed limit sign. Another extreme example is the one-pixel attack: an adversarial attack can be made by altering only one pixel of an image in order to fool deep neural networks.
Regarding the above, many efforts have been made to design deep learning models that are robust to adversarial attacks. On the other hand, explainable AI (XAI) can open the black box of deep learning models and generate explanations of the predictions made by deep models. In particular, model agnostic XAI approaches such as LIME and SHAP have been developed in recent years.
This project aims to explore the use of artificial immune systems (AIS) to facilitate building deep learning models that are robust against malicious attacks. We are interested in how the immune-inspired approaches can be used as a principled approach to designing robust models that can recognise adversarial attacks, just like how antibodies in our immune systems effectively recognise attacks from viruses or bacteria. We will investigate the potential of major immune-inspired algorithms (i.e., immune network approaches, clonal selection, and negative selection algorithm) as applied to train robust deep learning models by leveraging existing adversarial defence methods and proposing novel defence strategies.
Furthermore, within the immune-inspired framework for designing models, we will investigate the design of XAI systems which can explain the predictions of not only normal data samples, but also adversarial attacks. This will help better understand the characteristics of the attacks and provide insights when a deep learning model fails to identify an attack. Such insights will be fed back to our immune-inspired framework for further model improvement.