A Practical Method for Generating String Counterfactuals

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Interventions performed on the representation space of language models have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information, such as gender, within the model’s representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we present a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.

Original languageEnglish
Title of host publication2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Conference Findings, NAACL 2025
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics (ACL)
Pages3267-3286
Number of pages20
ISBN (Electronic)9798891761957
DOIs
StatePublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025 - Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Publication series

Name2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
Country/TerritoryUnited States
CityAlbuquerque
Period29/04/254/05/25

Bibliographical note

Publisher Copyright:
© 2025 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'A Practical Method for Generating String Counterfactuals'. Together they form a unique fingerprint.

Cite this