Abstract
Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances.
However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the
robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the
actor and a newly introduced disturber and ensure their optimization with H-infinity constraint. In contrast to the actor that maximizes the discounted
overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the cost in each iteration. To keep
joint optimization between the actor and the disturber stable, our H-infinity constraint mandates the
bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire
the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree
Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it
is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice.
On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including
stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.