Abstract
The Reinforcement Learning (RL) building blocks, i.e. Q-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the Q-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables' underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 38th International Conference on Machine Learning, ICML 2021 |
| Publisher | ML Research Press |
| Pages | 9301-9312 |
| Number of pages | 12 |
| ISBN (Electronic) | 9781713845065 |
| State | Published - 2021 |
| Event | 38th International Conference on Machine Learning, ICML 2021 - Virtual, Online Duration: 18 Jul 2021 → 24 Jul 2021 |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Volume | 139 |
| ISSN (Electronic) | 2640-3498 |
Conference
| Conference | 38th International Conference on Machine Learning, ICML 2021 |
|---|---|
| City | Virtual, Online |
| Period | 18/07/21 → 24/07/21 |
Bibliographical note
Publisher Copyright:Copyright © 2021 by the author(s)
Fingerprint
Dive into the research topics of 'Recomposing the Reinforcement Learning Building Blocks with Hypernetworks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver