Learning skills from human motions offers a promising path toward generalizable policies for whole-body humanoid control, yet two key cornerstones are missing: (1) a scalable, high-quality motion tracking framework that faithfully transforms kinematic references into robust, extremely dynamic motions on real hardware, and (2) a distillation approach that can effectively learn these motion primitives and compose them to solve downstream tasks. We address these gaps with BeyondMimic, a real-world framework to learn from human motions for versatile and naturalistic humanoid control via guided diffusion . Our framework provides a motion tracking pipeline, capable of challenging skills such as jumping spins, sprinting, and cartwheel with state-of-the-art motion quality. Beyond mimicking, we further introduce a unified diffusion policy that enables zero-shot task-specific control at test time using simple cost functions . Deployed on hardware, BeyondMimic performs diverse tasks including waypoint navigation, joystick teleoperation, and obstacle avoidance, bridging sim-to-real motion tracking and flexible synthesis of human motion primitives for whole-body control.
Below are 24 clips from policies trained on 14 distinct ~3-minute sequences, all using exactly the same MDP setup and hyperparameters.
Arbitrary joystick control, waypoint following, and obstacle avoidance
(Mocap is used for determining locations of waypoints, obstacles, and for helping state estimation)