Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning

Ho Jae Lee1, Se Hwan Jeon1, Sangbae Kim1,
1Biomimetic Robotics Lab
Massachusetts Institute of Technology, MA, USA

Abstract

Humans naturally swing their arms during locomotion to regulate whole-body dynamics, reduce angular momentum, and help maintain balance. Inspired by this principle, we present a limb-level multi-agent reinforcement learning (RL) framework that enables coordinated whole-body control of humanoid robots through emergent arm motion. Our approach employs separate actor-critic structures for the arms and legs, trained with centralized critics but decentralized actors that share only base states and centroidal angular momentum (CAM) observations, allowing each agent to specialize in task-relevant behaviors through modular reward design. The arm agent guided by CAM tracking and damping rewards promotes arm motions that reduce overall angular momentum and vertical ground reaction moments, contributing to improved balance during locomotion or under external perturbations. Comparative studies with single-agent and alternative multi-agent baselines further validate the effectiveness of our approach. Finally, we deploy the learned policy on the MIT Humanoid, achieving robust performance across diverse locomotion tasks, including flat-ground walking, rough terrain traversal, and stair climbing.

Overview

Contributions

  • We introduce a CAM reward based on biomechanical studies of human walking, and find that it guides the emergence of natural arm swing for stable locomotion and effective push recovery for our policy.
  • We propose a multi-agent RL framework employing separate actor-critic networks for the arms and legs, trained centrally but executed in a decentralized manner.
  • We demonstrate the effectiveness and practicality of our controller by validating its performance on a humanoid platform, both in simulation and hardware experiments.

Push Recovery

The learned policy enables the humanoid to effectively utilize arm motions to recover balance from torque disturbances.

+ \(\tau_z\) torque disturbance

- \(\tau_z\) torque disturbance

BibTeX

@article{lee2025learning,
      title={Learning Humanoid Arm Motion via Centroidal Momentum Regularized Multi-Agent Reinforcement Learning},
      author={Lee, Ho Jae and Jeon, Se Hwan and Kim, Sangbae},
      journal={IEEE Robotics and Automation Letters},
      year={2025},
      publisher={IEEE}
    }