An optimal elimination algorithm for learning a best arm

Avinatan Hassidim, Ron Kupfer, Yaron Singer

Research output: Contribution to journalConference articlepeer-review

8 Scopus citations

Abstract

We consider the classic problem of (?, d)-PAC learning a best arm where the goal is to identify with confidence 1 - d an arm whose mean is an ?-approximation to that of the highest mean arm in a multi-armed bandit setting. This problem is one of the most fundamental problems in statistics and learning theory, yet somewhat surprisingly its worst case sample complexity is not well understood. In this paper we propose a new approach for (?, d)-PAC learning a best arm. This approach leads to an algorithm whose sample complexity converges to exactly the optimal sample complexity of (?, d)-learning the mean of n arms separately and we complement this result with a conditional matching lower bound. More specifically: • The algorithm’s sample complexity converges to exactly 2n ?2 log 1d as n grows and d = n1 ; • We prove that no elimination algorithm obtains sample complexity arbitrarily lower than 2n ?2 log 1d . Elimination algorithms is a broad class of (?, d)-PAC best arm learning algorithms that includes many algorithms in the literature. When n is independent of d our approach yields an algorithm whose sample complexity converges to 2 ? n2 log 1d as n grows.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume2020-December
StatePublished - 2020
Event34th Conference on Neural Information Processing Systems, NeurIPS 2020 - Virtual, Online
Duration: 6 Dec 202012 Dec 2020

Bibliographical note

Publisher Copyright:
© 2020 Neural information processing systems foundation. All rights reserved.

Fingerprint

Dive into the research topics of 'An optimal elimination algorithm for learning a best arm'. Together they form a unique fingerprint.

Cite this