Contrastive Explanations for Model Interpretability

Alon Jacovi, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

36 Scopus citations

Abstract

Contrastive explanations clarify why an event occurred in contrast to another. They are inherently intuitive to humans to both produce and comprehend. We propose a method to produce contrastive explanations in the latent space, via a projection of the input representation, such that only the features that differentiate two potential decisions are captured. Our modification allows model behavior to consider only contrastive reasoning, and uncover which aspects of the input are useful for and against particular decisions. Additionally, for a given input feature, our contrastive explanations can answer for which label, and against which alternative label, is the feature useful. We produce contrastive explanations via both high-level abstract concept attribution and low-level input token/span attribution for two NLP classification benchmarks. Our findings demonstrate the ability of label-contrastive explanations to provide fine-grained interpretability of model decisions.

Original languageEnglish
Title of host publicationEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages1597-1611
Number of pages15
ISBN (Electronic)9781955917094
StatePublished - 2021
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021

Publication series

NameEMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityVirtual, Punta Cana
Period7/11/2111/11/21

Bibliographical note

Publisher Copyright:
© 2021 Association for Computational Linguistics

Fingerprint

Dive into the research topics of 'Contrastive Explanations for Model Interpretability'. Together they form a unique fingerprint.

Cite this