Latent Backdoor Attacks on Deep Neural Networks
Backdoor attacks on deep neural networks (DNNs) are hidden malicious behaviors embedded into DNN models, where misclassification rules are hidden inside normal models, only to be triggered by very specific inputs. When models are compromised, extremely dangerous consequences might happen since DNNs are widely deployed in safety and security-critical areas like self-driving cars. However, these “traditional” backdoors assume a context where users train their own models from scratch, which rarely occurs in practice. Instead, users typically customize “Teacher” models which are pretrained by model providers like Google, through a process called transfer learning. This customization process introduces significant changes to models and disrupts hidden backdoors, greatly reducing the actual impact of backdoors in practice. In this study, we describe latent backdoors, a more powerful and stealthy variant of backdoor attacks that functions under transfer learning. Latent backdoors are incomplete backdoors embedded into a “Teacher” model, and automatically inherited by multiple “Student” models through transfer learning. If any Student models include the label targeted by the backdoor, then its customization process completes the backdoor and makes it active. We show that latent backdoors can be quite effective in a variety of application contexts, and validate its practicality through real-world attacks against traffic sign recognition, iris identification of volunteers, and facial recognition of public figures (politicians). Finally, we evaluate 4 potential defenses, and find that only one is effective in disrupting latent backdoors, but might incur a cost in classification accuracy as tradeoff.
Huiying's advisors are Prof. Ben Zhao and Prof. Heather Zheng