Reasoning about complex visual scenes involves perception of entities and their relations. Scene Graphs (SGs) provide a natural representation for reasoning tasks, by assigning labels to both entities (nodes) and relations (edges). Reasoning systems based on SGs are typically trained in a two-step procedure: first, a model is trained to predict SGs from images, and next a separate model is trained to reason based on the predicted SGs. However, it would seem preferable to train such systems in an end-to-end manner. The challenge, which we address here is that scene-graph representations are non-differentiable and therefore it isn't clear how to use them as intermediate components. Here we propose Differentiable Scene Graphs (DSGs), an image representation that is amenable to differentiable end-to-end optimization, and requires supervision only from the downstream tasks. DSGs provide a dense representation for all regions and pairs of regions, and do not spend modelling capacity on regions of the image that do not contain objects or relations of interest. We evaluate our model on the challenging task of identifying referring relationships (RR) in three benchmark datasets: Visual Genome, VRD and CLEVR. Using DSGs as an intermediate representation leads to new state-of-the-art performance. The full code is available at https://github.com/shikorab/DSG.
|Title of host publication||Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||10|
|State||Published - Mar 2020|
|Event||2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020 - Snowmass Village, United States|
Duration: 1 Mar 2020 → 5 Mar 2020
|Name||Proceedings - 2020 IEEE Winter Conference on Applications of Computer Vision, WACV 2020|
|Conference||2020 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2020|
|Period||1/03/20 → 5/03/20|
Bibliographical noteFunding Information:
This project was funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant ERC HOLI 819080).
Acknowledgments: This project was funded by the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant ERC HOLI 819080).
© 2020 IEEE.