Abstract
Similarly to the classical AI planning, the Atari 2600 games supported in the Arcade Learning Environment all feature a fully observable (RAM) state and actions that have deterministic effect. At the same time, the problems in ALE are given only implicitly, via a simulator, a priori precluding exploiting most of the modern classical planning techniques. Despite that, Lipovetzky et al. [2015] recently showed how online planning for Atari-like problems can be effectively addressed using IW(i), a blind state-space search algorithm that employs a certain form of structural similarity-based pruning. We show that the effectiveness of the blind statespace search for Atari-like online planning can be pushed even further by focusing the search using both structural state similarity and the relative myopic value of the states. We also show that the planning effectiveness can be further improved by considering online planning for the Atari games as a multiarmed bandit style competition between the various actions available at the state planned for, and not purely as a classical planning style action sequence optimization problem.
Original language | English |
---|---|
Pages (from-to) | 3251-3257 |
Number of pages | 7 |
Journal | IJCAI International Joint Conference on Artificial Intelligence |
Volume | 2016-January |
State | Published - 2016 |
Externally published | Yes |
Event | 25th International Joint Conference on Artificial Intelligence, IJCAI 2016 - New York, United States Duration: 9 Jul 2016 → 15 Jul 2016 |