Abstract
Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. If successful, such architectures would be capable of performing a wide range of intriguing tasks, from adapting a pre-trained network to a new domain to editing objects represented as functions (INRs or NeRFs). As a first step towards this goal, we present here a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trained MLP and processes it using a composition of layers that are equivariant to the natural permutation symmetry of the MLP's weights: Changing the order of neurons in intermediate layers of the MLP does not affect the function it represents. We provide a full characterization of all affine equivariant and invariant layers for these symmetries and show how these layers can be implemented using three basic operations: pooling, broadcasting, and fully connected layers applied to the input in an appropriate manner. We demonstrate the effectiveness of our architecture and its advantages over natural baselines in a variety of learning tasks.
Original language | English |
---|---|
Pages (from-to) | 25790-25816 |
Number of pages | 27 |
Journal | Proceedings of Machine Learning Research |
Volume | 202 |
State | Published - 2023 |
Event | 40th International Conference on Machine Learning, ICML 2023 - Honolulu, United States Duration: 23 Jul 2023 → 29 Jul 2023 |
Bibliographical note
Publisher Copyright:© 2023 Proceedings of Machine Learning Research. All rights reserved.
Funding
The authors wish to thank Nadav Dym and Derek Lim for providing valuable feedback on early versions of the manuscript, and Yaron Lipman for the helpful discussions. This study was funded by a grant to GC from the Israel Science Foundation (ISF 737/2018), and by an equipment grant to GC and Bar-Ilan University from the Israel Science Foundation (ISF 2332/18). AN and AS are supported by a grant from the Israeli higher-council of Education, through the Bar-Ilan data science institute. IA is supported by a PhD fellowship from the Israeli Council for higher education.
Funders | Funder number |
---|---|
Israel Science Foundation | ISF 2332/18, ISF 737/2018 |
Council for Higher Education |