TY - GEN
T1 - Towards adaptive multi-robot coordination based on resource expenditure velocity
AU - Erusalimchik, Dan
AU - Kaminka, Gal A.
PY - 2008
Y1 - 2008
N2 - In the research area of multi-robot systems, several researchers have reported on consistent success in using heuristic measures to improve loose coordination in teams, by minimizing coordination costs using various heuristic techniques. While these heuristic methods has proven successful in several domains, they have never been formalized, nor have they been put in context of existing work on adaptation and learning. As a result, the conditions for their use remain unknown. We posit that in fact all of these different heuristic methods are instances of reinforcement learning in a one-stage MDP game, with the specific heuristic functions used as rewards. We show that a specific reward function-which we call Effectiveness Index (EI)-is an appropriate reward function for learning to select between coordination methods. EI estimates the resource-spending velocity by a coordination algorithm, and allows minimization of this velocity using familiar reinforcement learning algorithms (in our case, Q-learning in one-stage MDP). The paper analytically and empirically argues for the use of EI by proving that under certain conditions, maximizing this reward leads to greater utility in the task. We report on initial experiments that demonstrate that EI indeed overcomes limitations in previous work, and outperforms it in different cases.
AB - In the research area of multi-robot systems, several researchers have reported on consistent success in using heuristic measures to improve loose coordination in teams, by minimizing coordination costs using various heuristic techniques. While these heuristic methods has proven successful in several domains, they have never been formalized, nor have they been put in context of existing work on adaptation and learning. As a result, the conditions for their use remain unknown. We posit that in fact all of these different heuristic methods are instances of reinforcement learning in a one-stage MDP game, with the specific heuristic functions used as rewards. We show that a specific reward function-which we call Effectiveness Index (EI)-is an appropriate reward function for learning to select between coordination methods. EI estimates the resource-spending velocity by a coordination algorithm, and allows minimization of this velocity using familiar reinforcement learning algorithms (in our case, Q-learning in one-stage MDP). The paper analytically and empirically argues for the use of EI by proving that under certain conditions, maximizing this reward leads to greater utility in the task. We report on initial experiments that demonstrate that EI indeed overcomes limitations in previous work, and outperforms it in different cases.
KW - Coordination
KW - Multi-robot systems
KW - Reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=84871727284&partnerID=8YFLogxK
U2 - 10.3233/978-1-58603-887-8-288
DO - 10.3233/978-1-58603-887-8-288
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84871727284
SN - 9781586038878
T3 - Intelligent Autonomous Systems 10, IAS 2008
SP - 288
EP - 297
BT - Intelligent Autonomous Systems 10, IAS 2008
T2 - 10th International Conference on Intelligent Autonomous Systems, IAS 2008
Y2 - 23 July 2008 through 25 July 2008
ER -