Abstract
Text-to-image (T2I) diffusion models rely on encoded prompts to guide the image generation process. Typically, these prompts are extended to a fixed length by appending padding tokens to the input. Despite being a default practice, the influence of padding tokens on the image generation process has not been investigated. In this work, we conduct the first in-depth analysis of the role padding tokens play in T2I models. We develop two causal techniques to analyze how information is encoded in the representation of tokens across different components of the T2I pipeline. Using these techniques, we investigate when and how padding tokens impact the image generation process. Our findings reveal three distinct scenarios: padding tokens may affect the model's output during text encoding, during the diffusion process, or be effectively ignored. Moreover, we identify key relationships between these scenarios and the model's architecture (cross or self-attention) and its training process (frozen or trained text encoder). These insights contribute to a deeper understanding of the mechanisms of padding tokens, potentially informing future model design and training practices in T2I systems.
| Original language | English |
|---|---|
| Title of host publication | Long Papers |
| Editors | Luis Chiruzzo, Alan Ritter, Lu Wang |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 7618-7632 |
| Number of pages | 15 |
| ISBN (Electronic) | 9798891761896 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025 - Hybrid, Albuquerque, United States Duration: 29 Apr 2025 → 4 May 2025 |
Publication series
| Name | Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025 |
|---|---|
| Volume | 1 |
Conference
| Conference | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025 |
|---|---|
| Country/Territory | United States |
| City | Hybrid, Albuquerque |
| Period | 29/04/25 → 4/05/25 |
Bibliographical note
Publisher Copyright:© 2025 Association for Computational Linguistics.
Fingerprint
Dive into the research topics of 'Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver