D-STEP: Dynamic Spatio-Temporal Pruning

Avraham Raviv, Yonatan Dinai, Igor Drozdov, Niv Zehngut, Ishay Goldin

Research output: Contribution to conferencePaperpeer-review

1 Scopus citations


Video processing requires analysis of spatial features that are changing over time. By combining spatial and temporal modelling together, a neural network can gain a better understanding of the scene with no increase in computation. Spatio-temporal modeling can also be used to identify redundant and sparse information in both the spatial and the temporal domains. In this work we present Dynamic Spatio-Temporal Pruning, DSTEP, a new, simple, yet efficient method for learning the evolution of spatial mapping between frames. More specifically, we used a cascade of lightweight policy networks to dynamically filter out, per input, regions and channels that do not provide information while also sharing information across time. Guided by the policy networks, the model is able to focus on relevant data and filters, avoiding unnecessary computations. Extensive evaluations on Something-Something-V2, Jester and Mini-Kinetics action recognition datasets demonstrate that the proposed method shows a significantly improved accuracy-compute trade-off over the current state-of-the-art methods. We release our code and trained models at https://github.com/DynamicAR/DSTEP.

Original languageEnglish
StatePublished - 2022
Externally publishedYes
Event33rd British Machine Vision Conference Proceedings, BMVC 2022 - London, United Kingdom
Duration: 21 Nov 202224 Nov 2022


Conference33rd British Machine Vision Conference Proceedings, BMVC 2022
Country/TerritoryUnited Kingdom

Bibliographical note

Publisher Copyright:
© 2022. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.


Dive into the research topics of 'D-STEP: Dynamic Spatio-Temporal Pruning'. Together they form a unique fingerprint.

Cite this