LEXPLAIN: Improving Model Explanations via Lexicon Supervision

Orevaoghene Ahia, Hila Gonen, Vidhisha Balachandran, Yulia Tsvetkov, Noah A. Smith

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Model explanations that shed light on the model's predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model's predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXPLAIN, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the plausibility of model's explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection. Our analyses show that our method also demotes spurious correlations (i.e., with respect to African American English dialect) on toxicity detection, improving fairness.

Original languageEnglish
Title of host publicationStarSEM 2023 - 12th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference
EditorsAlexis Palmer, Jose Camacho-Collados
PublisherAssociation for Computational Linguistics (ACL)
Pages207-216
Number of pages10
ISBN (Electronic)9781959429760
StatePublished - 2023
Externally publishedYes
Event12th Joint Conference on Lexical and Computational Semantics, StarSEM 2023, co-located with ACL 2023 - Toronto, Canada
Duration: 13 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference12th Joint Conference on Lexical and Computational Semantics, StarSEM 2023, co-located with ACL 2023
Country/TerritoryCanada
CityToronto
Period13/07/2314/07/23

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'LEXPLAIN: Improving Model Explanations via Lexicon Supervision'. Together they form a unique fingerprint.

Cite this