Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data

Extreme Balance Motions

Ip Man's Squat (Unseen Data) - (叶问蹲)

Single-Leg Balance Standing (Randomly Generated)

Dynamic Motions

Running

Basketball Dribbling

Waist Twists (Unseen Data from Video)

Lunge (Unseen Data from Video)

Swing and Squat (Unseen Data from Video)

Prayer Squat (Unseen Data from Video)

IMU-based Teleoperation

(No Optical MoCap System in Use, 1x)

Kungfu

Football

Multi-robots Teleoperation

RGB-based Real-time Teleoperation

Abstract

Humanoid robots are envisioned to perform a wide range of tasks in human-centered environments, requiring controllers that combine agility with robust balance. Recent advances in locomotion and whole-body tracking have enabled impressive progress in either agile dynamic skills or stability-critical behaviors, but existing methods remain specialized, focusing on one capability while compromising the other.

In this work, we introduce AMS (Agility Meets Stability), the first framework that unifies both dynamic motion tracking and extreme balance maintenance in a single policy. Our key insight is to leverage heterogeneous data sources: human motion capture datasets that provide rich, agile behaviors, and physically constrained synthetic balance motions that capture stability configurations. To reconcile the divergent optimization goals of agility and stability, we design a hybrid reward scheme that applies general tracking objectives across all data while injecting balance-specific priors only into synthetic motions. Further, an adaptive learning strategy with performance-driven sampling and motion-specific reward shaping enables efficient training across diverse motion distributions.

We validate AMS extensively in simulation and on a real Unitree G1 humanoid. Experiments demonstrate that a single policy can execute agile skills such as dancing and running, while also performing zero-shot extreme balance motions like Ip Man's Squat, highlighting AMS as a versatile control paradigm for future humanoid applications.

Overview of AMS

(a) The general whole-body tracking pipeline retargets human MoCap data to reference motions and adopts a teacher-student-based strategy for reinforcement learning To address data limitations and conflicting optimization objectives, AMS introduces three key components as follows. (b) Synthetic balance data is generated to complement human MoCap data and address data limitations. (c) Adaptive learning is employed with adaptive sampling and reward shaping based on individual motion performance. (d) Hybrid rewards are designed with general rewards for all motions and balance prior rewards exclusively for synthetic motions.

BibTeX

@article{pan2025ams,
  title={Agility Meets Stability: Versatile Humanoid Control with Heterogeneous Data},
  author={Pan, Yixuan and Qiao, Ruoyi and Chen, Li and Chitta, Kashyap and Pan, Liang and Mai, Haoguang and Bu, Qingwen and Zheng, Cunyuan and Zhao, Hao and Luo, Ping and Li, Hongyang},
  journal={arXiv preprint arXiv:2511.17373},
  year={2025}
}