Abstract
Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.
Original language | English |
---|---|
Title of host publication | AAAI-23 Technical Tracks 8 |
Editors | Brian Williams, Yiling Chen, Jennifer Neville |
Publisher | AAAI press |
Pages | 9606-9613 |
Number of pages | 8 |
ISBN (Electronic) | 9781577358800 |
State | Published - 27 Jun 2023 |
Event | 37th AAAI Conference on Artificial Intelligence, AAAI 2023 - Washington, United States Duration: 7 Feb 2023 → 14 Feb 2023 |
Publication series
Name | Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 |
---|---|
Volume | 37 |
Conference
Conference | 37th AAAI Conference on Artificial Intelligence, AAAI 2023 |
---|---|
Country/Territory | United States |
City | Washington |
Period | 7/02/23 → 14/02/23 |
Bibliographical note
Publisher Copyright:Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.