BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

Chen, Xiaoyi; Salem, Ahmed; Chen, Dingfan; Backes, Michael; Ma, Shiqing; Shen, Qingni; Wu, Zheyun; Zhang, Yang

doi:10.60882/cispa.24613827.v1

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

conference contribution

posted on 2023-11-29, 18:18 authored by Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael BackesMichael Backes, Shiqing Ma, Qingni Shen, Zheyun WuZheyun Wu, Yang ZhangYang Zhang

Deep neural network (DNN) has progressed rapidly during the past decade and DNN models have been deployed in various real-world applications. Meanwhile, DNN models have been shown to be vulnerable to security and privacy attacks. One such attack that has attracted a great deal of attention recently is the backdoor attack. Specifically, the adversary poisons the target model's training set to mislead any input with an added secret trigger to a target class. In this paper, we perform a systematic investigation of the backdoor attack on NLP models, and propose BadNL, a general NLP backdoor attack framework including novel attack methods which are highly effective, preserve model utility, and guarantee stealthiness. Specifically, we propose three methods to construct triggers, namely BadChar, BadWord, and BadSentence, including basic and semantic-preserving variants. Our attacks achieve an almost perfect attack success rate with a negligible effect on the original model's utility. For instance, using the BadChar, our backdoor attack achieves a 98.9% attack success rate with yielding a utility improvement of 1.5% on the SST-5 dataset when only poisoning 3% of the original set.

History

Preferred Citation

Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu and Yang Zhang. BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements. In: Annual Computer Security Applications Conference (ACSAC). 2021.

Primary Research Area

Trustworthy Information Processing

Name of Conference

Annual Computer Security Applications Conference (ACSAC)

Legacy Posted Date

2021-12-06

Open Access Type

Unknown

BibTeX

@inproceedings{cispa_all_3529, title = "BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements", author = "Chen, Xiaoyi and Salem, Ahmed and Chen, Dingfan and Backes, Michael and Ma, Shiqing and Shen, Qingni and Wu, Zhonghai and Zhang, Yang", booktitle="{Annual Computer Security Applications Conference (ACSAC)}", year="2021", }

BadNL: Backdoor Attacks against NLP Models with Semantic-preserving Improvements

History

Preferred Citation

Primary Research Area

Name of Conference

Legacy Posted Date

Open Access Type

BibTeX

Usage metrics

Categories

Keywords

Licence

Exports