Distributed online bandit optimization under random quantization

Deming Yuan, Baoyong Zhang, Daniel W. C. Ho, Wei Xing Zheng, Shengyuan Xu

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)

Abstract

This paper considers the problem of solving distributed online optimization over a network that consists of multiple interacting nodes. Each node in the network is endowed with a sequence of loss functions, each of which is revealed to the node after a decision has been committed. The goal of the network is to minimize the cumulative loss functions of nodes in a distributed fashion, while subject to two types of information constraints, namely, message quantization and bandit feedback. To this end, a quantized distributed online bandit optimization algorithm is proposed by adopting random quantization operation and one-point gradient estimator. We show the convergence of the algorithm by establishing an O(dT3/4) regret bound, where d is the dimension of states and T is the total number of rounds. Finally, an online distributed quadratic programming problem is investigated to validate the theoretical findings of the paper.
Original languageEnglish
Article number110590
Number of pages11
JournalAutomatica
Volume146
DOIs
Publication statusPublished - 2022

Fingerprint

Dive into the research topics of 'Distributed online bandit optimization under random quantization'. Together they form a unique fingerprint.

Cite this