Natural Actor-Critic Converges Globally for Hierarchical Linear Quadratic Regulator


Multi-agent reinforcement learning has been successfully applied to a number of challenging problems. Despite these empirical successes, theoretical understanding of different algorithms is lacking, primarily due to the curse of dimensionality caused by the exponential growth of the state-action space with the number of agents. We study a fundamental problem of multi-agent linear quadratic regulator in a setting where the agents are partially exchangeable. In this setting, we develop a hierarchical actor-critic algorithm, whose computational complexity is independent of the total number of agents, and prove its global linear convergence to the optimal policy. As linear quadratic regulators are often used to approximate general dynamic systems, this paper provided an important step towards better understanding of general hierarchical mean-field multi-agent reinforcement learning.

Technical report
Yuwei Luo
Yuwei Luo
MS Student (2019-2020)

Yuwei Luo received his M.S. degree in Statistics at University of Chicago in March 2020. Prior to graduate school, he received in Mathematics at University of Science and Technology of China (USTC) in June 2018. His research interests include reinforcement learning, control, optimization and network analysis. Yuwei is continuing his education as a PhD student at Stanford University.

Mladen Kolar
Mladen Kolar
Associate Professor of Econometrics and Statistics

Mladen Kolar is an Associate Professor of Econometrics and Statistics at the University of Chicago Booth School of Business. His research is focused on high-dimensional statistical methods, graphical models, varying-coefficient models and data mining, driven by the need to uncover interesting and scientifically meaningful structures from observational data.