Abstract
Generating realistic images of complex visual scenes becomes challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (the number of objects and edges) increases. In this work, we show that one limitation of current methods is their inability to capture semantic equivalence in graphs. We present a novel model that addresses these issues by learning canonical graph representations from the data, resulting in improved image generation for complex visual scenes (The project page is available at https://roeiherz.github.io/CanonicalSg2Im/). Our model demonstrates improved empirical performance on large scene graphs, robustness to noise in the input scene graph, and generalization on semantically equivalent graphs. Finally, we show improved performance of the model on three different benchmarks: Visual Genome, COCO, and CLEVR.
Original language | English |
---|---|
Title of host publication | Computer Vision – ECCV 2020 - 16th European Conference, 2020, Proceedings |
Editors | Andrea Vedaldi, Horst Bischof, Thomas Brox, Jan-Michael Frahm |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 210-227 |
Number of pages | 18 |
ISBN (Print) | 9783030585730 |
DOIs | |
State | Published - 2020 |
Event | 16th European Conference on Computer Vision, ECCV 2020 - Glasgow, United Kingdom Duration: 23 Aug 2020 → 28 Aug 2020 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12371 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 16th European Conference on Computer Vision, ECCV 2020 |
---|---|
Country/Territory | United Kingdom |
City | Glasgow |
Period | 23/08/20 → 28/08/20 |
Bibliographical note
Publisher Copyright:© 2020, Springer Nature Switzerland AG.
Funding
Acknowledgments. This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant ERC HOLI 819080). Prof. Darrell’s group was supported in part by DoD, NSF, BAIR, and BDD. This work was completed in partial fulfillment for the Ph.D degree of the first author.
Funders | Funder number |
---|---|
BAIR | |
National Science Foundation | |
U.S. Department of Defense | |
Horizon 2020 Framework Programme | 819080 |
European Commission |
Keywords
- Canonical representations
- Image generation
- Scene graphs