TY - JOUR
T1 - One for All and All for One: Distributed Learning of Fair Allocations With Multi-Player Bandits
T2 - Distributed Learning of Fair Allocations with Multi-Player Bandits
AU - Bistritz, Ilai
AU - Baharav, Tavor Z.
AU - Leshem, Amir
AU - Bambos, Nicholas
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, represented as an N× M matrix. These utilities are unknown to the players. In each turn, players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm in that turn, all colliding players will receive zero utility due to the conflict. No communication between the players is possible. We propose two distributed algorithms which learn fair matchings between players and arms while minimizing the regret. We show that our first algorithm learns a max-min fairness matching with near- O\log T) regret (up to a log log T factor). However, if one has a known target Quality of Service (QoS) (which may vary between players) then we show that our second algorithm learns a matching where all players obtain an expected reward of at least their QoS with constant regret, given that such a matching exists. In particular, if the max-min value is known, a max-min fairness matching can be learned with O(1) regret.
AB - Consider N cooperative but non-communicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, represented as an N× M matrix. These utilities are unknown to the players. In each turn, players select an arm and receive a noisy observation of their utility for it. However, if any other players selected the same arm in that turn, all colliding players will receive zero utility due to the conflict. No communication between the players is possible. We propose two distributed algorithms which learn fair matchings between players and arms while minimizing the regret. We show that our first algorithm learns a max-min fairness matching with near- O\log T) regret (up to a log log T factor). However, if one has a known target Quality of Service (QoS) (which may vary between players) then we show that our second algorithm learns a matching where all players obtain an expected reward of at least their QoS with constant regret, given that such a matching exists. In particular, if the max-min value is known, a max-min fairness matching can be learned with O(1) regret.
KW - Multi-player bandits
KW - distributed learning
KW - fairness
KW - online learning
KW - resource allocation
UR - http://www.scopus.com/inward/record.url?scp=85115717555&partnerID=8YFLogxK
U2 - 10.1109/jsait.2021.3073065
DO - 10.1109/jsait.2021.3073065
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
SN - 2641-8770
VL - 2
SP - 584
EP - 598
JO - IEEE Journal on Selected Areas in Information Theory
JF - IEEE Journal on Selected Areas in Information Theory
IS - 2
M1 - 9404291
ER -