CISPA
Browse

Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors

Download (6.15 MB)
preprint
posted on 2024-10-31, 09:43 authored by Sina MavaliSina Mavali, Jonas Ricker, David PapeDavid Pape, Yash Sharma, Asja Fischer, Lea SchönherrLea Schönherr
While generative AI (GenAI) offers countless possibilities for creative and productive tasks, artificially generated media can be misused for fraud, manipulation, scams, misinformation campaigns, and more. To mitigate the risks associated with maliciously generated media, forensic classifiers are employed to identify AI-generated content. However, current forensic classifiers are often not evaluated in practically relevant scenarios, such as the presence of an attacker or when real-world artifacts like social media degradations affect images. In this paper, we evaluate state-of-the-art AI-generated image (AIGI) detectors under different attack scenarios. We demonstrate that forensic classifiers can be effectively attacked in realistic settings, even when the attacker does not have access to the target model and post-processing occurs after the adversarial examples are created, which is standard on social media platforms. These attacks can significantly reduce detection accuracy to the extent that the risks of relying on detectors outweigh their benefits. Finally, we propose a simple defense mechanism to make CLIP-based detectors, which are currently the best-performing detectors, robust against these attacks.

History

Primary Research Area

  • Threat Detection and Defenses

Open Access Type

  • Green

BibTeX

@misc{Mavali:Ricker:Pape:Sharma:Fischer:Schönherr:2024, title = "Fake It Until You Break It: On the Adversarial Robustness of AI-generated Image Detectors", author = "Mavali, Sina" AND "Ricker, Jonas" AND "Pape, David" AND "Sharma, Yash" AND "Fischer, Asja" AND "Schönherr, Lea", year = 2024, month = 10 }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC