Upper Confidence Interval Strategies for Multi-Armed Bandits with Entropy Rewards

Nir Weinberger, Michal Yemini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

We introduce a multi-armed bandit problem with information-based rewards. At each round, a player chooses an arm, observes a symbol, and receives an unobserved reward in the form of the symbol's self-information. The player aims to maximize the expected total reward associated with the entropy values of the arms played. We propose two algorithms based on upper confidence bounds (UCB) for this model. The first algorithm optimistically corrects the bias term in the entropy estimation. The second algorithm relies on data-dependent UCBs that adapt to sources with small entropy values. We provide performance guarantees by upper bounding the expected regret of each of the algorithms, and compare their asymptotic behavior to the Lai-Robbins lower bound. Finally, we provide numerical results illustrating the regret of the algorithms presented.

Original languageEnglish
Title of host publication2022 IEEE International Symposium on Information Theory, ISIT 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1647-1652
Number of pages6
ISBN (Electronic)9781665421591
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 IEEE International Symposium on Information Theory, ISIT 2022 - Espoo, Finland
Duration: 26 Jun 20221 Jul 2022

Publication series

NameIEEE International Symposium on Information Theory - Proceedings
Volume2022-June
ISSN (Print)2157-8095

Conference

Conference2022 IEEE International Symposium on Information Theory, ISIT 2022
Country/TerritoryFinland
CityEspoo
Period26/06/221/07/22

Bibliographical note

Publisher Copyright:
© 2022 IEEE.

Fingerprint

Dive into the research topics of 'Upper Confidence Interval Strategies for Multi-Armed Bandits with Entropy Rewards'. Together they form a unique fingerprint.

Cite this