Robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, they will increasingly need to interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. This article focuses on a limited version of the ad hoc teamwork problem in which an agent knows the environmental dynamics and has had past experiences with other teammates, though these experiences may not be representative of the current teammates. To tackle this problem, this article introduces a new general-purpose algorithm, PLASTIC, that reuses knowledge learned from previous teammates or provided by experts to quickly adapt to new teammates. This algorithm is instantiated in two forms: 1) PLASTIC-Model – which builds models of previous teammates’ behaviors and plans behaviors online using these models and 2) PLASTIC-Policy – which learns policies for cooperating with previous teammates and selects among these policies online. We evaluate PLASTIC on two benchmark tasks: the pursuit domain and robot soccer in the RoboCup 2D simulation domain. Recognizing that a key requirement of ad hoc teamwork is adaptability to previously unseen agents, the tests use more than 40 previously unknown teams on the first task and 7 previously unknown teams on the second. While PLASTIC assumes that there is some degree of similarity between the current and past teammates’ behaviors, no steps are taken in the experimental setup to make sure this assumption holds. The teammates were created by a variety of independent developers and were not designed to share any similarities. Nonetheless, the results show that PLASTIC was able to identify and exploit similarities between its current and past teammates’ behaviors, allowing it to quickly adapt to new teammates.
|Number of pages||40|
|State||Published - 1 Jan 2017|
Bibliographical noteFunding Information:
This work has taken place in the Learning Agents Research Group (LARG) at the Artificial Intelligence Laboratory, The University of Texas at Austin. LARG research is supported in part by grants from the National Science Foundation ( CNS-1330072 , CNS-1305287 ), ONR ( 21C184-01 ), and AFOSR ( FA8750-14-1-0070 , FA9550-14-1-0087 ). This research was supported by the Israel Science Foundation (grant No. 1488/14 ), and by ERC Grant # 267523 . Peter Stone serves on the Board of Directors of Cogitai, Inc. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research.
© 2016 Elsevier B.V.
- Ad hoc teamwork
- Multiagent cooperation
- Multiagent systems
- Pursuit domain
- Reinforcement learning
- RoboCup soccer