Abstract
Video neural networks are computationally expensive. For real-time applications they require significant compute resources that are lacking on edge devices. Various methods were proposed to reduce the computational load of neural networks. Among them, dynamic approaches adapt the network architecture, its weights or the input resolution to the content of the input. Our proposed approach, showcased on the task of video action recognition, allows to dynamically reduce computations for a wide range of video processing networks by utilizing the redundancy between frames and channels. A per-layer lightweight policy network is used to make a per-filter decision regarding the filter’s importance. Important filters are retained while others are scaled down or entirely skipped. Our method is the first to allow the policy network to gain a broader temporal context considering features aggregated over time. Temporal aggregation is done using self-attention between present, past and future (if available) input tensor descriptors. As demonstrated on a large variety of leading benchmarks such as Something-Something-V2, Mini-Kinetics, Jester and ActivityNet1.3, and over multiple network architectures, our method is able to enhance accuracy or save up to 70% of the FLOPs with no accuracy degradation, outperforming existing dynamic pruning methods by a large margin and setting a new bar for the accuracy-efficiency trade-off allowed by dynamic methods. We release the code and trained models at https://github.com/tapsdyn/TAPS.
Original language | English |
---|---|
Title of host publication | Computer Vision – ACCV 2024 - 17th Asian Conference on Computer Vision, Proceedings |
Editors | Minsu Cho, Ivan Laptev, Du Tran, Angela Yao, Hongbin Zha |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 422-438 |
Number of pages | 17 |
ISBN (Print) | 9789819609079 |
DOIs | |
State | Published - 2025 |
Externally published | Yes |
Event | 17th Asian Conference on Computer Vision, ACCV 2024 - Hanoi, Viet Nam Duration: 8 Dec 2024 → 12 Dec 2024 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 15474 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 17th Asian Conference on Computer Vision, ACCV 2024 |
---|---|
Country/Territory | Viet Nam |
City | Hanoi |
Period | 8/12/24 → 12/12/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
Keywords
- Action Recognition
- Dynamic Pruning
- Efficient Inference