TY - JOUR
T1 - Hierarchical optimal synchronization for linear systems via reinforcement learning : a Stackelberg-Nash game perspective
AU - Li, Man
AU - Qin, Jiahu
AU - Ma, Qichao
AU - Zheng, Wei Xing
AU - Kang, Yu
N1 - Publisher Copyright:
© 2012 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - Considering the fact that in the real world, a certain agent may have some sort of advantage to act before others, a novel hierarchical optimal synchronization problem for linear systems, composed of one major agent and multiple minor agents, is formulated and studied in this article from a Stackelberg-Nash game perspective. The major agent herein makes its decision prior to others, and then, all the minor agents determine their actions simultaneously. To seek the optimal controllers, the Hamilton-Jacobi-Bellman (HJB) equations in coupled forms are established, whose solutions are further proven to be stable and constitute the Stackelberg-Nash equilibrium. Due to the introduction of the asymmetric roles for agents, the established HJB equations are more strongly coupled and more difficult to solve than that given in most existing works. Therefore, we propose a new reinforcement learning (RL) algorithm, i.e., a two-level value iteration (VI) algorithm, which does not rely on complete system matrices. Furthermore, the proposed algorithm is shown to be convergent, and the converged values are exactly the optimal ones. To implement this VI algorithm, neural networks (NNs) are employed to approximate the value functions, and the gradient descent method is used to update the weights of NNs. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.
AB - Considering the fact that in the real world, a certain agent may have some sort of advantage to act before others, a novel hierarchical optimal synchronization problem for linear systems, composed of one major agent and multiple minor agents, is formulated and studied in this article from a Stackelberg-Nash game perspective. The major agent herein makes its decision prior to others, and then, all the minor agents determine their actions simultaneously. To seek the optimal controllers, the Hamilton-Jacobi-Bellman (HJB) equations in coupled forms are established, whose solutions are further proven to be stable and constitute the Stackelberg-Nash equilibrium. Due to the introduction of the asymmetric roles for agents, the established HJB equations are more strongly coupled and more difficult to solve than that given in most existing works. Therefore, we propose a new reinforcement learning (RL) algorithm, i.e., a two-level value iteration (VI) algorithm, which does not rely on complete system matrices. Furthermore, the proposed algorithm is shown to be convergent, and the converged values are exactly the optimal ones. To implement this VI algorithm, neural networks (NNs) are employed to approximate the value functions, and the gradient descent method is used to update the weights of NNs. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.
KW - linear systems
KW - mathematical optimization
KW - neural networks (computer science)
KW - synchronization
UR - http://hdl.handle.net/1959.7/uws:56652
UR - http://www.scopus.com/inward/record.url?scp=85084077486&partnerID=8YFLogxK
U2 - 10.1109/TNNLS.2020.2985738
DO - 10.1109/TNNLS.2020.2985738
M3 - Article
C2 - 32340962
SN - 2162-237X
VL - 32
SP - 1600
EP - 1611
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 4
ER -