GUMBEL COUNTERFACTUAL GENERATION FROM LANGUAGE MODELS

Shauli Ravfogel, Anej Svete, Vésteinn Snæbjarnarson, Ryan Cotterell

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Understanding and manipulating the causal generation mechanisms in language models is essential for controlling their behavior. Previous work has primarily relied on techniques such as representation surgery-e.g., model ablations or manipulation of linear subspaces tied to specific concepts-to intervene on these models. To understand the impact of interventions precisely, it is useful to examine counterfactuals-e.g., how a given sentence would have appeared had it been generated by the model following a specific intervention. We highlight that counterfactual reasoning is conceptually distinct from interventions, as articulated in Pearl's causal hierarchy. Based on this observation, we propose a framework for generating true string counterfactuals by reformulating language models as a structural equation model using the Gumbel-max trick, which we called Gumbel counterfactual generation. This reformulation allows us to model the joint distribution over original strings and their counterfactuals resulting from the same instantiation of the sampling noise. We develop an algorithm based on hindsight Gumbel sampling that allows us to infer the latent noise variables and generate counterfactuals of observed strings. Our experiments demonstrate that the approach produces meaningful counterfactuals while at the same time showing that commonly used intervention techniques have considerable undesired side effects.

Original languageEnglish
Title of host publication13th International Conference on Learning Representations, ICLR 2025
PublisherInternational Conference on Learning Representations, ICLR
Pages83555-83581
Number of pages27
ISBN (Electronic)9798331320850
StatePublished - 2025
Externally publishedYes
Event13th International Conference on Learning Representations, ICLR 2025 - Singapore, Singapore
Duration: 24 Apr 202528 Apr 2025

Publication series

Name13th International Conference on Learning Representations, ICLR 2025

Conference

Conference13th International Conference on Learning Representations, ICLR 2025
Country/TerritorySingapore
CitySingapore
Period24/04/2528/04/25

Bibliographical note

Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.

Fingerprint

Dive into the research topics of 'GUMBEL COUNTERFACTUAL GENERATION FROM LANGUAGE MODELS'. Together they form a unique fingerprint.

Cite this