CISPA
Browse

UnsafeBench: Benchmarking Image Safety Classifiers onReal-World and AI-Generated Images

Download (4.22 MB)
conference contribution
posted on 2025-11-11, 09:10 authored by Yiting QuYiting Qu, Xinyue ShenXinyue Shen, Yixin WuYixin Wu, Michael BackesMichael Backes, Savvas Zannettou, Yang ZhangYang Zhang
<p dir="ltr">With the advent of text-to-image models and concerns about their misuse, developers are increasingly relying on image safety clas?sifiers to moderate their generated unsafe images. Yet, the perfor?mance of current image safety classifiers remains unknown for both real-world and AI-generated images. In this work, we propose <i>UnsafeBench</i>, a benchmarking framework that evaluates the effec?tiveness and robustness of image safety classifiers, with a particular focus on the impact of AI-generated images on their performance. First, we curate a large dataset of 10K real-world and AI-generated images that are annotated as safe or unsafe based on a set of 11unsafe categories of images (sexual, violent, hateful, etc.). Then, we evaluate the effectiveness and robustness of five popular image safety classifiers, as well as three classifiers that are powered by general-purpose visual language models. Our assessment indicates that existing image safety classifiers are not comprehensive and effective enough to mitigate the multifaceted problem of unsafe im?ages. Also, there exists a distribution shift between real-world and AI-generated images in image qualities, styles, and layouts, leading to degraded effectiveness and robustness. Motivated by these find?ings, we build a comprehensive image moderation tool called <i>Per?</i><i>spectiveVision</i>, which improves the effectiveness and robustness of existing classifiers, especially on AI-generated images. UnsafeBench and PerspectiveVision can aid the research community in better understanding the landscape of image safety classification in the era of generative AI.</p>

Funding

Understanding the individual host response against Hepatitis D Virus to develop a personalized approach for the management of hepatitis D

European Commission

Find out more...

Joint project: Representative, synthetic health data with strong privacy guarantees - PriSyn -

Federal Ministry of Education and Research

Find out more...

History

Related Materials

  1. 1.

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC