TY - JOUR
T1 - Quantile Multi-Armed Bandits
T2 - Optimal Best-Arm Identification and a Differentially Private Scheme
AU - Nikolakakis, Konstantinos E.
AU - Kalogerias, Dionysios S.
AU - Sheffet, Or
AU - Sarwate, Anand D.
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2021/6
Y1 - 2021/6
N2 - We study the best-arm identification problem in multi-armed bandits with stochastic rewards when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a successive elimination algorithm for strictly optimal best-arm identification, show that it is δ -PAC and characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem - as we show, when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private information, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support and characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.
AB - We study the best-arm identification problem in multi-armed bandits with stochastic rewards when the goal is to identify the arm with the highest quantile at a fixed, prescribed level. First, we propose a successive elimination algorithm for strictly optimal best-arm identification, show that it is δ -PAC and characterize its sample complexity. Further, we provide a lower bound on the expected number of pulls, showing that the proposed algorithm is essentially optimal up to logarithmic factors. Both upper and lower complexity bounds depend on a special definition of the associated suboptimality gap, designed in particular for the quantile bandit problem - as we show, when the gap approaches zero, best-arm identification is impossible. Second, motivated by applications where the rewards are private information, we provide a differentially private successive elimination algorithm whose sample complexity is finite even for distributions with infinite support and characterize its sample complexity. Our algorithms do not require prior knowledge of either the suboptimality gap or other statistical information related to the bandit problem at hand.
KW - Quantile bandits
KW - best-arm identification
KW - differential privacy
KW - sequential estimation
KW - value at risk
UR - http://www.scopus.com/inward/record.url?scp=85128288118&partnerID=8YFLogxK
U2 - 10.1109/JSAIT.2021.3081525
DO - 10.1109/JSAIT.2021.3081525
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85128288118
SN - 2641-8770
VL - 2
SP - 534
EP - 548
JO - IEEE Journal on Selected Areas in Information Theory
JF - IEEE Journal on Selected Areas in Information Theory
IS - 2
M1 - 9435774
ER -