Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive Reasoning

Saar Cohen, Noa Agmon

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Multi-agent reinforcement learning (MARL) requires coordination by some means of interaction between agents to efficiently solve tasks. Interaction graphs allow reasoning about joint actions based on the local structure of interactions, but they disregard the potential impact of an agent's action on its neighbors' behaviors, which could rapidly alter in dynamic settings. In this paper, we thus present a novel perspective on opponent modeling in domains with only local interactions using (level-1) Graph Probabilistic Recursive Reasoning (GrPR2). Unlike previous work on recursive reasoning, each agent iteratively best-responds to other agents' policies over all possible local interactions. Agents' policies are approximated via a variational Bayes scheme for capturing their uncertainties, and we prove that an induced variant of Q-learning converges under self-play when there exists only one Nash equilibrium. In cooperative settings, we further devise a variational lower bound on the likelihood of each agent's optimality. Opposed to other models, optimizing the resulting objective prevents each agent from attaining an unrealistic modelling of others, and yields an exact tabular Q-iteration method that holds convergence guarantees. Then, we deepen the recursion to level-k via Cognitive Hierarchy GrPR2 (GrPR2-CH), which lets each level-k player best-respond to a mixture of strictly lower levels in the hierarchy. We prove that: (1) level-3 reasoning is the optimal hierarchical level, maximizing each agent's expected return; and (2) the weak spot of the classical CH models is that 0-level is uniformly distributed, as it may introduce policy bias. Finally, we propose a practical actor-critic scheme, and illustrate that GrPR2-CH outperforms strong MARL baselines in the particle environment.

Original languageEnglish
Title of host publicationInternational Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Pages290-299
Number of pages10
ISBN (Electronic)9781713854333
StatePublished - 2022
Event21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand
Duration: 9 May 202213 May 2022

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume1
ISSN (Print)1548-8403
ISSN (Electronic)1558-2914

Conference

Conference21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
Country/TerritoryNew Zealand
CityAuckland, Virtual
Period9/05/2213/05/22

Bibliographical note

Publisher Copyright:
© 2022 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved

Funding

This research was funded in part by ISF grant 2306/18.

FundersFunder number
Israel Science Foundation2306/18

    Keywords

    • Cognitive Hierarchy
    • Interaction Graphs
    • Multi-Agent Coordination
    • Multi-Agent Reinforcement Learning
    • Variational Inference

    Fingerprint

    Dive into the research topics of 'Optimizing Multi-Agent Coordination via Hierarchical Graph Probabilistic Recursive Reasoning'. Together they form a unique fingerprint.

    Cite this