Differentially private contextual linear bandits

Roshan Shariff, Or Sheffet

Research output: Contribution to journalConference articlepeer-review

57 Scopus citations

Abstract

We study the contextual linear bandit problem, a version of the standard stochastic multi-armed bandit (MAB) problem where a learner sequentially selects actions to maximize a reward which depends also on a user provided per-round context. Though the context is chosen arbitrarily or adversarially, the reward is assumed to be a stochastic function of a feature vector that encodes the context and selected action. Our goal is to devise private learners for the contextual linear bandit problem. We first show that using the standard definition of differential privacy results in linear regret. So instead, we adopt the notion of joint differential privacy, where we assume that the action chosen on day t is only revealed to user t and thus needn't be kept private that day, only on following days. We give a general scheme converting the classic linear-UCB algorithm into a joint differentially private algorithm using the tree-based algorithm [10, 18]. We then apply either Gaussian noise or Wishart noise to achieve joint-differentially private algorithms and bound the resulting algorithms' regrets. In addition, we give the first lower bound on the additional regret any private algorithms for the MAB problem must incur.

Original languageEnglish
Pages (from-to)4296-4306
Number of pages11
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - 2018
Externally publishedYes
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2 Dec 20188 Dec 2018

Bibliographical note

Publisher Copyright:
© 2018 Curran Associates Inc..All rights reserved.

Funding

We gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for supporting R.S. with the Alexander Graham Bell Canada Graduate Scholarship and O.S. with grant #2017–06701. R.S. was also supported by Alberta Innovates and O.S. is also an unpaid collaborator on NSF grant #1565387. We gratefully acknowledge the Natural Sciences and Engineering Research Council of Canada (NSERC) for supporting R.S. with the Alexander Graham Bell Canada Graduate Scholarship and O.S. with grant #2017-06701. R.S. was also supported by Alberta Innovates and O.S. is also an unpaid collaborator on NSF grant #1565387.

FundersFunder number
National Science Foundation1565387
Horace H. Rackham School of Graduate Studies, University of Michigan
Natural Sciences and Engineering Research Council of Canada2017–06701
National Science Foundation
Alberta Innovates

    Fingerprint

    Dive into the research topics of 'Differentially private contextual linear bandits'. Together they form a unique fingerprint.

    Cite this