Scaling Agents via Continual Pre-training

Sep 21, 2025

∙ Paid

Authors: Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li, Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang, Zhengwei Tao, Wenbiao Yin, Chenxiong Qian, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
Paper: https://arxiv.org/abs/2509.13310
Code: https://github.com/Alibaba-NLP/DeepResearch
Project Page: https://tongyi-agent.github.io/blog

TL;DR

WHAT was done? This paper, from Alibaba Group, introduces "Agentic Continual Pre-training" (Agentic CPT), a novel intermediate training stage positioned between standard pre-training and task-specific fine-tuning. This new layer is designed to build powerful "agentic foundation models" pre-aligned with core agent behaviors like multi-step reasoning and tool use. To fuel this process, the authors developed two scalable, offline data synthesis methods: First-order Action Synthesis (FAS) for generating planning and reasoning data without external API calls, and Higher-order Action Synthesis (HAS) for remodeling suboptimal trajectories into rich, multi-step decision-making problems. These methods are integrated into a progressive two-stage training strategy, culminating in the AgentFounder-30B model.

WHY it matters? This work provides a concrete recipe for addressing a fundamental bottleneck in agent development: post-training methods on general-purpose models struggle to simultaneously teach complex agentic capabilities and align them with expert demonstrations, creating an "optimization tension." By embedding agentic skills at a foundational level, Agentic CPT makes subsequent fine-tuning more efficient and effective, as evidenced by significantly lower SFT loss. The resulting AgentFounder model establishes a new state-of-the-art across 10 demanding benchmarks, outperforming strong open-source and even commercial deep research agents. This research charts a more robust and scalable path toward building highly capable AI agents, helping to democratize the creation of advanced agentic systems.

Details

The Challenge of Building Capable AI Agents

The evolution of large language models (LLMs) into autonomous agents capable of complex problem-solving has been a major focus in AI research. However, a persistent challenge, particularly for the open-source community, has been the performance gap compared to leading commercial systems. Current approaches typically rely on post-training methods like Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to imbue general-purpose foundation models with agentic skills.

This paper compellingly argues that this approach suffers from a "fundamental optimization tension." Models are forced to learn diverse and complex agentic behaviors (like tool invocation and multi-step reasoning) at the same time as they are being aligned to specific expert demonstrations. This dual burden limits their ultimate effectiveness. The authors propose a paradigm shift: instead of retrofitting general models, we should build foundation models that are inherently agentic from an earlier stage.

Keep reading with a 7-day free trial

Subscribe to ArXivIQ to keep reading this post and get 7 days of free access to the full post archives.