LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games.

Abdelnabi, Sahar; Gomaa, Amr; Sivaprasad, Sarath; Schönherr, Lea; Fritz, Mario

doi:10.60882/cispa.25233028.v1

LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games.

journal contribution

posted on 2024-02-19, 09:36 authored by Sahar Abdelnabi, Amr Gomaa, Sarath SivaprasadSarath Sivaprasad, Lea SchönherrLea Schönherr, Mario FritzMario Fritz

There is a growing interest in using Large Language Models (LLMs) as agents to tackle real-world tasks that may require assessing complex situations. Yet, we have a limited understanding of LLMs' reasoning and decision-making capabilities, partly stemming from a lack of dedicated evaluation benchmarks. As negotiating and compromising are key aspects of our everyday communication and collaboration, we propose using scorable negotiation games as a new evaluation framework for LLMs. We create a testbed of diverse text-based, multi-agent, multi-issue, semantically rich negotiation games, with easily tunable difficulty. To solve the challenge, agents need to have strong arithmetic, inference, exploration, and planning capabilities, while seamlessly integrating them. Via a systematic zero-shot Chain-of-Thought prompting (CoT), we show that agents can negotiate and consistently reach successful deals. We quantify the performance with multiple metrics and observe a large gap between GPT-4 and earlier models. Importantly, we test the generalization to new games and setups. Finally, we show that these games can help evaluate other critical aspects, such as the interaction dynamics between agents in the presence of greedy and adversarial players.

History

Primary Research Area

Threat Detection and Defenses

Journal

CoRR

Volume

abs/2309.17234

Sub Type

Article

BibTeX

@article{Abdelnabi:Gomaa:Sivaprasad:Schönherr:Fritz:2023, title = "LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games.", author = "Abdelnabi, Sahar" AND "Gomaa, Amr" AND "Sivaprasad, Sarath" AND "Schönherr, Lea" AND "Fritz, Mario", year = 2023, month = 9, journal = "CoRR" }

LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games.

History

Primary Research Area

Journal

Volume

Sub Type

BibTeX

Usage metrics

Categories

Keywords

Licence

Exports