CISPA
Browse

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Download (1.02 MB)
conference contribution
posted on 2025-11-06, 07:13 authored by Advik Raj Basani, Xiao ZhangXiao Zhang
LLMs have demonstrated impressive capabilities across various natural language processing tasks yet remain vulnerable to prompts, known as jailbreak attacks, carefully designed to bypass safety guardrails and elicit harmful responses. Traditional methods rely on manual heuristics that suffer from limited generalizability. Despite being automatic, optimization-based attacks often produce unnatural jailbreak prompts that can be easily detected by safety filters or require high computational costs due to discrete token optimization. This paper introduces Generative Adversarial Suffix Prompter (GASP), a novel automated framework that can efficiently generate human-readable jailbreak prompts in a fully black-box setting. In particular, GASP leverages latent Bayesian optimization to craft adversarial suffixes by efficiently exploring continuous latent spaces, gradually optimizing the suffix generator to improve attack efficacy while balancing prompt coherence via a targeted iterative refinement procedure. Through comprehensive experiments, we show that GASP can produce natural adversarial prompts, significantly improving jailbreak success, reducing training times, and accelerating inference speed, thus making it an efficient and scalable solution for red-teaming LLMs.

History

Primary Research Area

  • Trustworthy Information Processing

Name of Conference

Conference on Neural Information Processing Systems (NeurIPS)

CISPA Affiliation

  • Yes

BibTeX

@conference{Basani:Zhang:2025, title = "GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs", author = "Basani, Advik Raj" AND "Zhang, Xiao", year = 2025, month = 9 }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC