ArXivIQ

ArXivIQ

Towards a Science of Scaling Agent Systems

Dec 13, 2025
∙ Paid

Authors: Yubin Kim, Ken Gu, Chanwoo Park, Chunjong Park, Samuel Schmidgall, A. Ali Heydari, Yao Yan, Zhihan Zhang, Yuchen Zhuang, Mark Malhotra, Paul Pu Liang, Hae Won Park, Yuzhe Yang, Xuhai Xu, Yilun Du, Shwetak Patel, Tim Althoff, Daniel McDuff, and Xin Liu
Paper: https://arxiv.org/abs/2512.08296

TL;DR

WHAT was done? The authors conducted a controlled evaluation of 180 agent system configurations, varying model capability (across OpenAI, Google, and Anthropic families), coordination topology, and task properties. They derived a quantitative “scaling law” for multi-agent systems (MAS) that predicts performance based on interaction metrics, challenging the prevailing assumption that increasing agent count monotonically improves performance.

WHY it matters? This work establishes that MAS performance is not driven by simple scaling but by a trade-off between parallelization benefits and coordination overhead. The study identifies specific “regimes of failure”—specifically tool-heavy and sequential tasks—where adding agents degrades performance by up to 70%, offering a predictive framework (R2=0.513) for determining when to deploy complex swarms versus single strong models.

Details

The Coordination Bottleneck

The current zeitgeist in agentic AI suggests that “more agents is all you need,” a heuristic implying that decomposing a task across a swarm of Large Language Models (LLMs) inevitably yields better results than a single reasoner. However, this view often conflates the benefits of collaboration with the benefits of simply increasing the inference compute budget (ensembling). This paper addresses the lack of a principled “science of scaling” by isolating the effects of coordination architecture from raw model capability. By controlling for token budgets and using a standardized “Intelligence Index” for base models (as visualized in Figure 1), the authors reveal that multi-agent systems do not scale linearly. Instead, they exhibit distinct failure modes where the “coordination tax”—the token cost and noise introduced by inter-agent communication—outweighs the benefits of distributed reasoning.

Agent System First Principles: Topology and State

User's avatar

Continue reading this post for free, courtesy of Grigory Sapunov.

Or purchase a paid subscription.
© 2026 Grigory Sapunov · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture