TY - JOUR
T1 - Model-free design of stochastic LQR controller from a primal-dual optimization perspective
AU - Li, Man
AU - Qin, Jiahu
AU - Zheng, Wei Xing
AU - Wang, Yaonan
AU - Kang, Yu
PY - 2022
Y1 - 2022
N2 - To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear–quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of primal–dual optimization. We first reformulate the stochastic LQR problem as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal–dual (MB-PD) algorithm based on the properties of the resulting Karush–Kuhn–Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we establish the connection between the proposed MB-PD algorithm and classical policy iteration algorithm, which provides a novel primal–dual optimization perspective to understand the common RL algorithms. Finally, we provide a high-dimensional case study to show the performance of the proposed algorithms.
AB - To further understand the underlying mechanism of various reinforcement learning (RL) algorithms and also to better use the optimization theory to make further progress in RL, many researchers begin to revisit the linear–quadratic regulator (LQR) problem, whose setting is simple and yet captures the characteristics of RL. Inspired by this, this work is concerned with the model-free design of stochastic LQR controller for linear systems subject to Gaussian noises, from the perspective of primal–dual optimization. We first reformulate the stochastic LQR problem as a constrained non-convex optimization problem, which is shown to have strong duality. Then, to solve this non-convex optimization problem, we propose a model-based primal–dual (MB-PD) algorithm based on the properties of the resulting Karush–Kuhn–Tucker (KKT) conditions. We also give a model-free implementation for the MB-PD algorithm by solving a transformed dual feasibility condition. More importantly, we establish the connection between the proposed MB-PD algorithm and classical policy iteration algorithm, which provides a novel primal–dual optimization perspective to understand the common RL algorithms. Finally, we provide a high-dimensional case study to show the performance of the proposed algorithms.
UR - https://hdl.handle.net/1959.7/uws:76536
U2 - 10.1016/j.automatica.2022.110253
DO - 10.1016/j.automatica.2022.110253
M3 - Article
VL - 140
JO - Automatica
JF - Automatica
M1 - 110253
ER -