Skip to main navigation Skip to search Skip to main content

Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models

  • Michael Toker
  • , Ido Galil
  • , Hadas Orgad
  • , Rinon Gal
  • , Yoad Tewel
  • , Gal Chechik
  • , Yonatan Belinkov

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text-to-image (T2I) diffusion models rely on encoded prompts to guide the image generation process. Typically, these prompts are extended to a fixed length by appending padding tokens to the input. Despite being a default practice, the influence of padding tokens on the image generation process has not been investigated. In this work, we conduct the first in-depth analysis of the role padding tokens play in T2I models. We develop two causal techniques to analyze how information is encoded in the representation of tokens across different components of the T2I pipeline. Using these techniques, we investigate when and how padding tokens impact the image generation process. Our findings reveal three distinct scenarios: padding tokens may affect the model's output during text encoding, during the diffusion process, or be effectively ignored. Moreover, we identify key relationships between these scenarios and the model's architecture (cross or self-attention) and its training process (frozen or trained text encoder). These insights contribute to a deeper understanding of the mechanisms of padding tokens, potentially informing future model design and training practices in T2I systems.

Original languageEnglish
Title of host publicationLong Papers
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics (ACL)
Pages7618-7632
Number of pages15
ISBN (Electronic)9798891761896
DOIs
StatePublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025 - Hybrid, Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Publication series

NameProceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
Volume1

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
Country/TerritoryUnited States
CityHybrid, Albuquerque
Period29/04/254/05/25

Bibliographical note

Publisher Copyright:
© 2025 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models'. Together they form a unique fingerprint.

Cite this