Linear Adversarial Concept Erasure

Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

Research output: Contribution to journalConference articlepeer-review

23 Scopus citations

Abstract

Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to control their content becomes an increasingly important problem. This paper formulates the problem of identifying and erasing a linear subspace that corresponds to a given concept in order to prevent linear predictors from recovering the concept. Our formulation consists of a constrained, linear minimax game. We consider different concept-identification objectives, modeled after several tasks such as classification and regression. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, our method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method-despite being linear-is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

Original languageEnglish
Pages (from-to)18400-18421
Number of pages22
JournalProceedings of Machine Learning Research
Volume162
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022

Bibliographical note

Publisher Copyright:
Copyright © 2022 by the author(s)

Funding

We thank Marius Mosbach, Yanai Elazar, Josef Valvoda and Tiago Pimentel for fruitful discussions. This project received funding from the Europoean Research Council (ERC) under the Europoean Union's Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT). Ryan Cotterell acknowledges Google for support from the Research Scholar Program. We thank Marius Mosbach, Yanai Elazar, Josef Valvoda and Tiago Pimentel for fruitful discussions. This project received funding from the Europoean Research Council (ERC) under the Europoean Union’s Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT). Ryan Cotterell acknowledges Google for support from the Research Scholar Program.

FundersFunder number
Europoean Research Council
Europoean Union's Horizon 2020 research and innovation programme
Europoean Union’s Horizon 2020 research and innovation programme802774
Google
Google
European Research Council

    Fingerprint

    Dive into the research topics of 'Linear Adversarial Concept Erasure'. Together they form a unique fingerprint.

    Cite this