MGTBench: Benchmarking Machine-Generated Text Detection

He, Xinlei; Shen, Xinyue; Chen, Zeyuan; Backes, Michael; Zhang, Yang

doi:10.60882/cispa.28060970.v1

MGTBench: Benchmarking Machine-Generated Text Detection

conference contribution

posted on 2024-12-19, 10:36 authored by Xinlei He, Xinyue ShenXinyue Shen, Zeyuan ChenZeyuan Chen, Michael BackesMichael Backes, Yang ZhangYang Zhang

Nowadays, powerful large language models (LLMs) such as ChatGPT have demonstrated revolutionary power in a variety of natural language processing (NLP) tasks such as text classification, sentiment analysis, language translation, and question-answering. Consequently, the detection of machine-generated texts (MGTs) is becoming increasingly crucial as LLMs become more advanced and prevalent. These models have the ability to generate human-like language, making it challenging to discern whether a text is authored by a human or a machine. This raises concerns regarding authenticity, accountability, and potential bias. However, existing methods for detecting MGTs are evaluated using different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework that encompasses various methodologies. Furthermore, it remains unclear how existing detection methods would perform against powerful LLMs. In this paper, we fill this gap by proposing the first benchmark framework for MGT detection against powerful LLMs, named MGTBench. Extensive evaluations on public datasets with curated texts generated by various powerful LLMs such as ChatGPT-turbo and Claude demonstrate the effectiveness of different detection methods. Our ablation study shows that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples. Additionally, our findings reveal that metric-based/model-based detection methods exhibit better transferability across different LLMs/datasets. Furthermore, we delve into a more challenging task: text attribution, where the goal is to identify the originating model of a given text, i.e., whether it is a specific LLM or authored by a human. Our findings indicate that the model-based detection methods still perform well in the text attribution task. To investigate the robustness of different detection methods, we consider three adversarial attacks, namely paraphrasing, random spacing, and adversarial perturbations. We discover that these attacks can significantly diminish detection effectiveness, underscoring the critical need for the development of more robust detection methods. We envision that MGTBench will serve as a benchmark tool to accelerate future investigations involving the evaluation of powerful MGT detection methods on their respective datasets and the development of more advanced MGT detection methods.

History

Primary Research Area

Trustworthy Information Processing

Name of Conference

ACM Conference on Computer and Communications Security (CCS)

CISPA Affiliation

Yes

Page Range

2251-2265

Publisher

Association for Computing Machinery (ACM)

Open Access Type

Green

BibTeX

@conference{He:Shen:Chen:Backes:Zhang:2024, title = "MGTBench: Benchmarking Machine-Generated Text Detection", author = "He, Xinlei" AND "Shen, Xinyue" AND "Chen, Zeyuan" AND "Backes, Michael" AND "Zhang, Yang", year = 2024, month = 12, pages = "2251--2265", publisher = "Association for Computing Machinery (ACM)", doi = "10.1145/3658644.3670344" }