(No Optical MoCap System in Use, 1x)
Humanoid robots are envisioned to perform a wide range of tasks in human-centered environments, requiring controllers that combine agility with robust balance. Recent advances in locomotion and whole-body tracking have enabled impressive progress in either agile dynamic skills or stability-critical behaviors, but existing methods remain specialized, focusing on one capability while compromising the other.
In this work, we introduce AMS (Agility Meets Stability), the first framework that unifies both dynamic motion tracking and extreme balance maintenance in a single policy. Our key insight is to leverage heterogeneous data sources: human motion capture datasets that provide rich, agile behaviors, and physically constrained synthetic balance motions that capture stability configurations. To reconcile the divergent optimization goals of agility and stability, we design a hybrid reward scheme that applies general tracking objectives across all data while injecting balance-specific priors only into synthetic motions. Further, an adaptive learning strategy with performance-driven sampling and motion-specific reward shaping enables efficient training across diverse motion distributions.
We validate AMS extensively in simulation and on a real Unitree G1 humanoid. Experiments demonstrate that a single policy can execute agile skills such as dancing and running, while also performing zero-shot extreme balance motions like Ip Man's Squat, highlighting AMS as a versatile control paradigm for future humanoid applications.
(a) The general whole-body tracking pipeline retargets human MoCap data to reference motions and adopts a teacher-student-based strategy for reinforcement learning To address data limitations and conflicting optimization objectives, AMS introduces three key components as follows. (b) Synthetic balance data is generated to complement human MoCap data and address data limitations. (c) Adaptive learning is employed with adaptive sampling and reward shaping based on individual motion performance. (d) Hybrid rewards are designed with general rewards for all motions and balance prior rewards exclusively for synthetic motions.
@article{pan2025ams,
title={Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data},
author={Pan, Yixuan and Qiao, Ruoyi and Chen, Li and Chitta, Kashyap and Pan, Liang and Mai, Haoguang and Bu, Qingwen and Zheng, Cunyuan and Zhao, Hao and Luo, Ping and Li, Hongyang},
journal={arXiv preprint arXiv:2511.17373},
year={2025}
}