SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Authors: Yifan Yang, Ziyang Gong, Weiquan Huang, Qihao Yang, Ziwei Zhou, Zisu Huang, Yan Li, Xuemei Gao, Qi Dai, Bei Liu, Kai Qiu, Yuqing Yang, Dongdong Chen, Xue Yang, Chong Luo
Paper: https://arxiv.org/abs/2605.23904
Code: https://aka.ms/SkillOpt
Model: N/A
TL;DR
WHAT was done? The authors introduce SkillOpt, a systematic, controllable text-space optimizer that treats natural-language agent skills as trainable external state. Instead of relying on manual prompting or erratic, unconstrained prompt rewrites, SkillOpt structures behavioral updates using deep-learning-inspired mechanisms like edit budgets (textual learning rates), strict validation gates, rejected-edit buffers, and epoch-wise slow/meta-updates.
WHY it matters? This research provides a highly stable, reproducible, and offline optimization framework for frozen frontier and small-scale language models. By compiling complex domain adaptations into static, human-readable markdown files, SkillOpt achieves massive accuracy gains (+23.5 points on average for GPT-5.5) across QA, spreadsheet execution, and embodied benchmarks, adding zero real-time inference latency or extra model-call overhead at deployment.
Details
The Brittle Frontier of Uncontrolled Prompt Refinement
LLM-agent behavioral refinement and domain adaptation are currently limited by a stark choice: perform expensive, closed-model weight fine-tuning, or rely on fragile, manual prompt engineering. While prompt auto-tuning has emerged as a lightweight alternative, existing methods are typically plagued by instability. Unconstrained self-reflections and prompt rewrites often result in erratic performance drifts, catastrophic forgetting of edge-case rules, or localized overfitting to anecdotal failures. SkillOpt addresses this fundamental bottleneck by treating the natural-language skill document as a persistent, trainable external parameter state. The critical improvement over prior works—such as TextGrad [review] and GEPA [review]—is the introduction of rigorous, deep-learning-style optimization controls that stabilize the text update process, transforming prompt refinement from a chaotic heuristic search into a disciplined, monotonic learning loop.


