Video

Abstract

Learning skills from human motions offers a promising path toward generalizable policies for whole-body humanoid control, yet two key cornerstones are missing: (1) a scalable, high-quality motion tracking framework that faithfully transforms kinematic references into robust, extremely dynamic motions on real hardware, and (2) a distillation approach that can effectively learn these motion primitives and compose them to solve downstream tasks. We address these gaps with BeyondMimic, a real-world framework to learn from human motions for versatile and naturalistic humanoid control via guided diffusion . Our framework provides a motion tracking pipeline, capable of challenging skills such as jumping spins, sprinting, and cartwheel with state-of-the-art motion quality. Beyond mimicking, we further introduce a unified diffusion policy that enables zero-shot task-specific control at test time using simple cost functions . Deployed on hardware, BeyondMimic performs diverse tasks including waypoint navigation, joystick teleoperation, and obstacle avoidance, bridging sim-to-real motion tracking and flexible synthesis of human motion primitives for whole-body control.

Our open-source motion tracking pipline can track motion stably and repeatably

Policies trained with BeyondMimic pipeline on the LAFAN1 Dataset

Below are 24 clips from policies trained on 14 distinct ~3-minute sequences, all using exactly the same MDP setup and hyperparameters.

Beyond mimicking, we distill selected policies into a single diffusion model with test-time guidance for zero-shot downstream tasks

Arbitrary joystick control, waypoint following, and obstacle avoidance
(Mocap is used for determining locations of waypoints, obstacles, and for helping state estimation)