ROLLOUT-GUIDED TOKEN PRUNING FOR EFFICIENT VIDEO UNDERSTANDING

  • Yonatan Dinai
  • , Ishay Goldin
  • , Avraham Raviv
  • , Niv Zehngut

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Vision Transformers have been proven powerful in various vision applications. Yet, their adaptations for video understanding tasks incur large computational costs, limiting their practical deployment on resource-constrained devices. Token pruning can effectively alleviate the processing overhead of underlying attention blocks, but often neglects the iterative processing nature of video models applied frame-by-frame. We propose to prune tokens according to the estimated contribution of their corresponding tokens in previous frames to previous predictions. We leverage attention rollout and token tracking to propagate token importance of previous outputs to current input tokens. Our method is interpretable, requires no training and has negligible memory overhead. We show the efficacy of our method for both video object detection and action recognition using different transformer architectures, achieving up to 65% reduction in FLOPS on ImageNet VID and 60% on EPIC-Kitchens with no accuracy degradation. We release the code and models at https://github.com/RGTPdyn/RGTP.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Image Processing, ICIP 2025 - Proceedings
PublisherIEEE Computer Society
Pages37-42
Number of pages6
ISBN (Electronic)9798331523794
DOIs
StatePublished - 2025
Externally publishedYes
Event32nd IEEE International Conference on Image Processing, ICIP 2025 - Anchorage, United States
Duration: 14 Sep 202517 Sep 2025

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference32nd IEEE International Conference on Image Processing, ICIP 2025
Country/TerritoryUnited States
CityAnchorage
Period14/09/2517/09/25

Bibliographical note

Publisher Copyright:
©2025 IEEE.

Keywords

  • Action Recognition
  • Attention Rollout
  • Token Pruning
  • Video Object Detection
  • Video Transformers

Fingerprint

Dive into the research topics of 'ROLLOUT-GUIDED TOKEN PRUNING FOR EFFICIENT VIDEO UNDERSTANDING'. Together they form a unique fingerprint.

Cite this