TY - JOUR
T1 - Performance Competitions as Research Infrastructure
T2 - Large Scale Comparative Studies of Multi-Agent Teams
AU - Kaminka, Gal A.
AU - Frank, Ian
AU - Arai, Katsuto
AU - Tanaka-Ishii, Kumiko
PY - 2003/7
Y1 - 2003/7
N2 - Performance competitions (events that pit many different programs against each other on a standardized task) provide a way for a research community to promote research progress towards challenging goals. In this paper, we argue that for maximum research benefit, any such competition must involve comparative studies under closely controlled, varying conditions. We demonstrate the critical role of comparative studies in the context of one well-known and growing performance competition: the annual Robotic Soccer World Cup (RoboCup) Championship. Specifically, over the past three years, we have carried out annual large-scale comparative evaluations - distinct from the competition itself - of the multi-agent teams taking part in the largest RoboCup league. Our study, which involved 30 different teams of agents produced by dozens of different research groups, focused on robustness. We show that (i) multi-agent teams exhibit a clear performance-robustness tradeoff; (ii) teams tend to over-specialize, so that they cannot handle beneficial changes we make to their operating environment; and (iii) teams improve in performance more than in robustness from one year to the next, despite the emphasis by RoboCup organizers on robustness as a key challenge. These results demonstrate the potential of large-scale comparative studies for producing important results otherwise difficult to discover, and are significant both in the lessons they raise for designers of multi-agent teams, and in understanding the place of performance competitions within the multi-agent research infrastructure.
AB - Performance competitions (events that pit many different programs against each other on a standardized task) provide a way for a research community to promote research progress towards challenging goals. In this paper, we argue that for maximum research benefit, any such competition must involve comparative studies under closely controlled, varying conditions. We demonstrate the critical role of comparative studies in the context of one well-known and growing performance competition: the annual Robotic Soccer World Cup (RoboCup) Championship. Specifically, over the past three years, we have carried out annual large-scale comparative evaluations - distinct from the competition itself - of the multi-agent teams taking part in the largest RoboCup league. Our study, which involved 30 different teams of agents produced by dozens of different research groups, focused on robustness. We show that (i) multi-agent teams exhibit a clear performance-robustness tradeoff; (ii) teams tend to over-specialize, so that they cannot handle beneficial changes we make to their operating environment; and (iii) teams improve in performance more than in robustness from one year to the next, despite the emphasis by RoboCup organizers on robustness as a key challenge. These results demonstrate the potential of large-scale comparative studies for producing important results otherwise difficult to discover, and are significant both in the lessons they raise for designers of multi-agent teams, and in understanding the place of performance competitions within the multi-agent research infrastructure.
KW - Comparative studies
KW - Linear regression
KW - Measurement
KW - Performance competitions
KW - Robustness
KW - Teamwork
UR - http://www.scopus.com/inward/record.url?scp=0346009383&partnerID=8YFLogxK
U2 - 10.1023/a:1024180921782
DO - 10.1023/a:1024180921782
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:0346009383
SN - 1387-2532
VL - 7
SP - 121
EP - 144
JO - Autonomous Agents and Multi-Agent Systems
JF - Autonomous Agents and Multi-Agent Systems
IS - 1-2
ER -