Planning and Learning with Adaptive Lookahead

Aviv Rosenberg, Assaf Hallak, Shie Mannor, Gal Chechik, Gal Dalal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

Original languageEnglish
Title of host publicationAAAI-23 Technical Tracks 8
EditorsBrian Williams, Yiling Chen, Jennifer Neville
PublisherAAAI press
Pages9606-9613
Number of pages8
ISBN (Electronic)9781577358800
StatePublished - 27 Jun 2023
Event37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States
Duration: 7 Feb 202314 Feb 2023

Publication series

NameProceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
Volume37

Conference

Conference37th AAAI Conference on Artificial Intelligence, AAAI 2023
Country/TerritoryUnited States
CityWashington
Period7/02/2314/02/23

Bibliographical note

Publisher Copyright:
Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Fingerprint

Dive into the research topics of 'Planning and Learning with Adaptive Lookahead'. Together they form a unique fingerprint.

Cite this