TY - JOUR
T1 - Optimal synchronization control of multiagent systems with input saturation via off-policy reinforcement learning
AU - Qin, Jiahu
AU - Li, Man
AU - Shi, Yang
AU - Ma, Qichao
AU - Zheng, Wei Xing
PY - 2019
Y1 - 2019
N2 - In this paper, we aim to investigate the optimal synchronization problem for a group of generic linear systems with input saturation. To seek the optimal controller, Hamilton-Jacobi-Bellman (HJB) equations involving nonquadratic input energy terms in coupled forms are established. The solutions to these coupled HJB equations are further proven to be optimal and the induced controllers constitute interactive Nash equilibrium. Due to the difficulty to analytically solve HJB equations, especially in coupled forms, and the possible lack of model information of the systems, we apply the data-based off-policy reinforcement learning algorithm to learn the optimal control policies. A byproduct of this off-policy algorithm is shown that it is insensitive to probing noise that is exerted to the system to maintain persistence of excitation condition. In order to implement this off-policy algorithm, we employ actor and critic neural networks to approximate the controllers and the cost functions. Furthermore, the estimated control policies obtained by this presented implementation are proven to converge to the optimal ones under certain conditions. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.
AB - In this paper, we aim to investigate the optimal synchronization problem for a group of generic linear systems with input saturation. To seek the optimal controller, Hamilton-Jacobi-Bellman (HJB) equations involving nonquadratic input energy terms in coupled forms are established. The solutions to these coupled HJB equations are further proven to be optimal and the induced controllers constitute interactive Nash equilibrium. Due to the difficulty to analytically solve HJB equations, especially in coupled forms, and the possible lack of model information of the systems, we apply the data-based off-policy reinforcement learning algorithm to learn the optimal control policies. A byproduct of this off-policy algorithm is shown that it is insensitive to probing noise that is exerted to the system to maintain persistence of excitation condition. In order to implement this off-policy algorithm, we employ actor and critic neural networks to approximate the controllers and the cost functions. Furthermore, the estimated control policies obtained by this presented implementation are proven to converge to the optimal ones under certain conditions. Finally, an illustrative example is provided to verify the effectiveness of the proposed algorithm.
KW - algorithms
KW - machine learning
KW - multiagent systems
KW - neural networks (computer science)
KW - synchronization
UR - http://handle.westernsydney.edu.au:8081/1959.7/uws:48869
U2 - 10.1109/TNNLS.2018.2832025
DO - 10.1109/TNNLS.2018.2832025
M3 - Article
SN - 2162-237X
VL - 30
SP - 85
EP - 96
JO - IEEE Transactions on Neural Networks and Learning Systems
JF - IEEE Transactions on Neural Networks and Learning Systems
IS - 1
ER -