Abstract
Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Additionally, for a given input feature, our contrastive explanations can answer for which label, and against which alternative label, is the feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.
Original language | English |
---|---|
Title of host publication | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 1597-1611 |
Number of pages | 15 |
ISBN (Electronic) | 9781955917094 |
State | Published - 2021 |
Event | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 |
Publication series
Name | EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings |
---|
Conference
Conference | 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 |
---|---|
Country/Territory | Dominican Republic |
City | Virtual, Punta Cana |
Period | 7/11/21 → 11/11/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics
Funding
We thank the anonymous reviewers for their helpful feedback, as well as colleagues from the Allen Institute for AI. This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT). We thank the anonymous reviewers for their helpful feedback, as well as colleagues from the Allen Institute for AI. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT).
Funders | Funder number |
---|---|
Horizon 2020 Framework Programme | |
ALLEN INSTITUTE | |
European Commission | |
Horizon 2020 | 802774 |