Abstract
Interventions performed on the representation space of language models have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information, such as gender, within the model’s representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we present a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.
| Original language | English |
|---|---|
| Title of host publication | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics |
| Subtitle of host publication | Proceedings of the Conference Findings, NAACL 2025 |
| Editors | Luis Chiruzzo, Alan Ritter, Lu Wang |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 3267-3286 |
| Number of pages | 20 |
| ISBN (Electronic) | 9798891761957 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025 - Albuquerque, United States Duration: 29 Apr 2025 → 4 May 2025 |
Publication series
| Name | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025 |
|---|
Conference
| Conference | 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025 |
|---|---|
| Country/Territory | United States |
| City | Albuquerque |
| Period | 29/04/25 → 4/05/25 |
Bibliographical note
Publisher Copyright:© 2025 Association for Computational Linguistics.
Fingerprint
Dive into the research topics of 'A Practical Method for Generating String Counterfactuals'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver