Abstract
We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem. We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day t is only revealed to user t and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm [10, 18]. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.
Original language | English |
---|---|
Pages (from-to) | 4296-4306 |
Number of pages | 11 |
Journal | Advances in Neural Information Processing Systems |
Volume | 2018-December |
State | Published - 2018 |
Externally published | Yes |
Event | 32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada Duration: 2 Dec 2018 → 8 Dec 2018 |
Bibliographical note
Publisher Copyright:© 2018 Curran Associates Inc..All rights reserved.
Funding
We gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for supporting R.S. with the Alexander Graham Bell Canada Graduate Scholarship and O.S. with grant #2017–06701. R.S. was also supported by Alberta Innovates and O.S. is also an unpaid collaborator on NSF grant #1565387. We gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for supporting R.S. with the Alexander Graham Bell Canada Graduate Scholarship and O.S. with grant #2017-06701. R.S. was also supported by Alberta Innovates and O.S. is also an unpaid collaborator on NSF grant #1565387.
Funders | Funder number |
---|---|
National Science Foundation | 1565387 |
Horace H. Rackham School of Graduate Studies, University of Michigan | |
Natural Sciences and Engineering Research Council of Canada | 2017–06701 |
National Science Foundation | |
Alberta Innovates |