## Abstract

Multi-agent reinforcement learning (MARL) requires coordination by some means of interaction between agents to efficiently solve tasks. Interaction graphs allow reasoning about joint actions based on the local structure of interactions, but they disregard the potential impact of an agent's action on its neighbors' behaviors, which could rapidly alter in dynamic settings. In this paper, we thus present a novel perspective on opponent modeling in domains with only local interactions using (level-1) Graph Probabilistic Recursive Reasoning (GrPR2). Unlike previous work on recursive reasoning, each agent iteratively best-responds to other agents' policies over all possible local interactions. Agents' policies are approximated via a variational Bayes scheme for capturing their uncertainties, and we prove that an induced variant of Q-learning converges under self-play when there exists only one Nash equilibrium. In cooperative settings, we further devise a variational lower bound on the likelihood of each agent's optimality. Opposed to other models, optimizing the resulting objective prevents each agent from attaining an unrealistic modelling of others, and yields an exact tabular Q-iteration method that holds convergence guarantees. Then, we deepen the recursion to level-k via Cognitive Hierarchy GrPR2 (GrPR2-CH), which lets each level-k player best-respond to a mixture of strictly lower levels in the hierarchy. We prove that: (1) level-3 reasoning is the optimal hierarchical level, maximizing each agent's expected return; and (2) the weak spot of the classical CH models is that 0-level is uniformly distributed, as it may introduce policy bias. Finally, we propose a practical actor-critic scheme, and illustrate that GrPR2-CH outperforms strong MARL baselines in the particle environment.

Original language | English |
---|---|

Title of host publication | International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 |

Publisher | International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) |

Pages | 290-299 |

Number of pages | 10 |

ISBN (Electronic) | 9781713854333 |

State | Published - 2022 |

Event | 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 - Auckland, Virtual, New Zealand Duration: 9 May 2022 → 13 May 2022 |

### Publication series

Name | Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS |
---|---|

Volume | 1 |

ISSN (Print) | 1548-8403 |

ISSN (Electronic) | 1558-2914 |

### Conference

Conference | 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022 |
---|---|

Country/Territory | New Zealand |

City | Auckland, Virtual |

Period | 9/05/22 → 13/05/22 |

### Bibliographical note

Funding Information:This research was funded in part by ISF grant 2306/18.

Publisher Copyright:

© 2022 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved

## Keywords

- Cognitive Hierarchy
- Interaction Graphs
- Multi-Agent Coordination
- Multi-Agent Reinforcement Learning
- Variational Inference