Abstract
Videos of actions are complex signals containing rich compositional structure in space and time. Current video generation methods lack the ability to condition the generation on multiple coordinated and potentially simultaneous timed actions. To address this challenge, we propose to represent the actions in a graph structure called Action Graph and present the new “Action Graph To Video” synthesis task. Our generative model for this task (AG2Vid) disentangles motion and appearance features, and by incorporating a scheduling mechanism for actions facilitates a timely and coordinated video generation. We train and evaluate AG2Vid on the CATER and Something-Something V2 datasets, and show that the resulting videos have better visual quality and semantic consistency compared to baselines. Finally, our model demonstrates zero-shot abilities by synthesizing novel compositions of the learned actions.
Original language | English |
---|---|
Title of host publication | Proceedings of the 38th International Conference on Machine Learning, ICML 2021 |
Publisher | ML Research Press |
Pages | 662-673 |
Number of pages | 12 |
ISBN (Electronic) | 9781713845065 |
State | Published - 2021 |
Event | 38th International Conference on Machine Learning, ICML 2021 - Virtual, Online Duration: 18 Jul 2021 → 24 Jul 2021 |
Publication series
Name | Proceedings of Machine Learning Research |
---|---|
Volume | 139 |
ISSN (Electronic) | 2640-3498 |
Conference
Conference | 38th International Conference on Machine Learning, ICML 2021 |
---|---|
City | Virtual, Online |
Period | 18/07/21 → 24/07/21 |
Bibliographical note
Publisher Copyright:Copyright © 2021 by the author(s)
Funding
We would like to thank Lior Bracha for her help running MTurk experiments, and to Haggai Maron and Yuval Atz-mon for helpful feedback and discussions. This project has received funding from the European Research Council (ERC) under the European Unions Horizon 2020 research and innovation programme (grant ERC HOLI 819080). Trevor Darrell’s group was supported in part by DoD including DARPA’s XAI, LwLL, and/or SemaFor programs, as well as BAIR’s industrial alliance programs. Gal Chechik’s group was supported by the Israel Science Foundation (ISF 737/2018), and by an equipment grant to Gal Chechik and Bar-Ilan University from the Israel Science Foundation (ISF 2332/18). This work was completed in partial fulfillment for the Ph.D degree of Amir Bar.
Funders | Funder number |
---|---|
U.S. Department of Defense | |
Defense Advanced Research Projects Agency | |
European Commission | |
Israel Science Foundation | ISF 2332/18, ISF 737/2018 |
Horizon 2020 | ERC HOLI 819080 |