Near Optimal Privacy Preserving Fair Multi-Agent Bandits

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

In this paper, we study the problem of fair multi-agent multi-arm bandit learning when agents do not communicate with each other, except collision information, provided to agents accessing the same arm simultaneously. We provide an algorithm with regret O (N3f(log T) log T) (assuming bounded rewards, with unknown bound), where f(t) is any function diverging to infinity with t. In contrast to optimal algorithms which share the rewards with a selected leader, our algorithm does not require a centralized collection of the arm rewards, allowing each agent to keep its rewards private. We also significantly improved previous privacy-preserving algorithms with the same upper bound on the regret of order O(f(log T) log T) but an exponential dependence on the number of agents. Simulation results present the dependence of the regret on log T.

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Fingerprint

Dive into the research topics of 'Near Optimal Privacy Preserving Fair Multi-Agent Bandits'. Together they form a unique fingerprint.

Cite this