TY - JOUR
T1 - Interleaved versus a priori exploration for repeated navigation in a partially-known graph
AU - Argamon-Engelson, Shlomo
AU - Kraus, Sarit
AU - Sina, Sigalit
PY - 1999/11
Y1 - 1999/11
N2 - In this paper, we address the tradeoff between exploration and exploitation for agents which need to learn more about the structure of their environment in order to perform more effectively. For example, a software agent operating on the World Wide Web may need to learn which sites on the net are most useful, and the most efficient routes to those sites. We compare exploration strategies for a repeated task, where the agent is given some particular task to perform some number of times. Tasks are modeled as navigation on a partially known (deterministic) graph. This paper describes a new utility-based exploration algorithm for repeated tasks which interleaves exploration with task performance. The method takes into account both the costs and the potential benefits (for future task repetitions) of different exploratory actions. Exploration is performed in a greedy fashion, with the locally optimal exploratory action performed during repetition of each task. We experimentally evaluated our utility-based interleaved exploration algorithm against a heuristic search algorithm for exploration before task performance (a priori exploration) as well as a randomized interleaved exploration algorithm. We found that for a single repeated task, utility-based interleaved exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms for the case of multiple repeated tasks, where the agent has a different, randomly-chosen task (from a known subset of possible tasks) to perform each time. Here too, we found that utility-based interleaved exploration is clear in most cases.
AB - In this paper, we address the tradeoff between exploration and exploitation for agents which need to learn more about the structure of their environment in order to perform more effectively. For example, a software agent operating on the World Wide Web may need to learn which sites on the net are most useful, and the most efficient routes to those sites. We compare exploration strategies for a repeated task, where the agent is given some particular task to perform some number of times. Tasks are modeled as navigation on a partially known (deterministic) graph. This paper describes a new utility-based exploration algorithm for repeated tasks which interleaves exploration with task performance. The method takes into account both the costs and the potential benefits (for future task repetitions) of different exploratory actions. Exploration is performed in a greedy fashion, with the locally optimal exploratory action performed during repetition of each task. We experimentally evaluated our utility-based interleaved exploration algorithm against a heuristic search algorithm for exploration before task performance (a priori exploration) as well as a randomized interleaved exploration algorithm. We found that for a single repeated task, utility-based interleaved exploration consistently outperforms the alternatives, unless the number of task repetitions is very high. In addition, we extended the algorithms for the case of multiple repeated tasks, where the agent has a different, randomly-chosen task (from a known subset of possible tasks) to perform each time. Here too, we found that utility-based interleaved exploration is clear in most cases.
KW - Expected utility
KW - Exploration versus exploitation
KW - Navigation
KW - Random graphs
UR - http://www.scopus.com/inward/record.url?scp=0348193571&partnerID=8YFLogxK
U2 - 10.1142/S0218001499000537
DO - 10.1142/S0218001499000537
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0348193571
SN - 0218-0014
VL - 13
SP - 963
EP - 986
JO - International Journal of Pattern Recognition and Artificial Intelligence
JF - International Journal of Pattern Recognition and Artificial Intelligence
IS - 7
ER -