CISPA
Browse

Towards Reverse-Engineering Black-Box Neural Networks

chapter
posted on 2023-11-29, 18:26 authored by Seong Joon Oh, Bernt Schiele, Mario FritzMario Fritz
Much progress in interpretable AI is built around scenarios where the user, one who interprets the model, has a full ownership of the model to be diagnosed. The user either owns the training data and computing resources to train an interpretable model herself or owns a full access to an already trained model to be interpreted post-hoc. In this chapter, we consider a less investigated scenario of diagnosing black-box neural networks, where the user can only send queries and read off outputs. Black-box access is a common deployment mode for many public and commercial models, since internal details, such as architecture, optimisation procedure, and training data, can be proprietary and aggravate their vulnerability to attacks like adversarial examples. We propose a method for exposing internals of black-box models and show that the method is surprisingly effective at inferring a diverse set of internal information. We further show how the exposed internals can be exploited to strengthen adversarial examples against the model. Our work starts an important discussion on the security implications of diagnosing deployed models with limited accessibility. The code is available at goo.gl/MbYfsv.

History

Preferred Citation

Oh Joon, Bernt Schiele and Mario Fritz. Towards Reverse-Engineering Black-Box Neural Networks. In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. 2019.

Primary Research Area

  • Trustworthy Information Processing

Secondary Research Area

  • Threat Detection and Defenses

Legacy Posted Date

2020-01-12

Book Title

Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

Chapter

7

Page Range

121-144

Open Access Type

  • Unknown

BibTeX

@incollection{cispa_all_3016, title = "Towards Reverse-Engineering Black-Box Neural Networks", author = "Joon Oh, Seong and Schiele, Bernt and Fritz, Mario", booktitle="{Explainable AI: Interpreting, Explaining and Visualizing Deep Learning}", year="2019", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC