Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Moab Arar, Rinon Gal, Yuval Atzmon, Gal Chechik, Daniel Cohen-Or, Ariel Shamir, Amit H. Bermano

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language prompts. Recently, encoder-based techniques have emerged as a new effective approach for T2I personalization, reducing the need for multiple images and long training times. However, most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts. In this work, we propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts. We introduce a novel contrastive-based regularization technique to maintain high fidelity to the target concept characteristics while keeping the predicted embeddings close to editable regions of the latent space, by pushing the predicted tokens toward their nearest existing CLIP tokens. Our experimental results demonstrate the effectiveness of our approach and show how the learned tokens are more semantic than tokens predicted by unregularized models. This leads to a better representation that achieves state-of-the-art performance while being more flexible than previous methods.

Original languageEnglish
Title of host publicationProceedings - SIGGRAPH Asia 2023 Conference Papers, SA 2023
EditorsStephen N. Spencer
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400703157
DOIs
StatePublished - 10 Dec 2023
Externally publishedYes
Event2023 SIGGRAPH Asia 2023 Conference Papers, SA 2023 - Sydney, Australia
Duration: 12 Dec 202315 Dec 2023

Publication series

NameProceedings - SIGGRAPH Asia 2023 Conference Papers, SA 2023

Conference

Conference2023 SIGGRAPH Asia 2023 Conference Papers, SA 2023
Country/TerritoryAustralia
CitySydney
Period12/12/2315/12/23

Bibliographical note

Publisher Copyright:
© 2023 ACM.

Funding

The first author is supported by the Miriam and Aaron Gutwirth scholarship. This work was partially supported by Len Blavatnik and the Blavatnik family foundation, the Deutsch Foundation, the Yandex Initiative in Machine Learning, BSF (grant 2020280) and ISF (grants 2492/20 and 3441/21).

FundersFunder number
Deutsch Foundation
Miriam and Aaron Gutwirth
Yandex Initiative in Machine Learning
Blavatnik Family Foundation
United States-Israel Binational Science Foundation2020280
Israel Science Foundation2492/20, 3441/21

    Keywords

    • Encoders
    • Inversion
    • Personalization

    Fingerprint

    Dive into the research topics of 'Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models'. Together they form a unique fingerprint.

    Cite this