Towards Reverse-Engineering Black-Box Neural Networks
chapter
posted on 2023-11-29, 18:26authored bySeong Joon Oh, Bernt Schiele, Mario FritzMario Fritz
Much progress in interpretable AI is built around scenarios where the user, one who interprets the model, has a full ownership of the model to be diagnosed. The user either owns the training data and computing resources to train an interpretable model herself or owns a full access to an already trained model to be interpreted post-hoc. In this chapter, we consider a less investigated scenario of diagnosing black-box neural networks, where the user can only send queries and read off outputs. Black-box access is a common deployment mode for many public and commercial models, since internal details, such as architecture, optimisation procedure, and training data, can be proprietary and aggravate their vulnerability to attacks like adversarial examples. We propose a method for exposing internals of black-box models and show that the method is surprisingly effective at inferring a diverse set of internal information. We further show how the exposed internals can be exploited to strengthen adversarial examples against the model. Our work starts an important discussion on the security implications of diagnosing deployed models with limited accessibility. The code is available at goo.gl/MbYfsv.
History
Preferred Citation
Oh Joon, Bernt Schiele and Mario Fritz. Towards Reverse-Engineering Black-Box Neural Networks. In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. 2019.
Primary Research Area
Trustworthy Information Processing
Secondary Research Area
Threat Detection and Defenses
Legacy Posted Date
2020-01-12
Book Title
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
Chapter
7
Page Range
121-144
Open Access Type
Unknown
BibTeX
@incollection{cispa_all_3016,
title = "Towards Reverse-Engineering Black-Box Neural Networks",
author = "Joon Oh, Seong
and Schiele, Bernt and Fritz, Mario",
booktitle="{Explainable AI: Interpreting, Explaining and Visualizing Deep Learning}",
year="2019",
}