Abstract
Model explanations that shed light on the model's predictions are becoming a desired additional output of NLP models, alongside their predictions. Challenges in creating these explanations include making them trustworthy and faithful to the model's predictions. In this work, we propose a novel framework for guiding model explanations by supervising them explicitly. To this end, our method, LEXPLAIN, uses task-related lexicons to directly supervise model explanations. This approach consistently improves the plausibility of model's explanations without sacrificing performance on the task, as we demonstrate on sentiment analysis and toxicity detection. Our analyses show that our method also demotes spurious correlations (i.e., with respect to African American English dialect) on toxicity detection, improving fairness.
Original language | English |
---|---|
Title of host publication | StarSEM 2023 - 12th Joint Conference on Lexical and Computational Semantics, Proceedings of the Conference |
Editors | Alexis Palmer, Jose Camacho-Collados |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 207-216 |
Number of pages | 10 |
ISBN (Electronic) | 9781959429760 |
State | Published - 2023 |
Externally published | Yes |
Event | 12th Joint Conference on Lexical and Computational Semantics, StarSEM 2023, co-located with ACL 2023 - Toronto, Canada Duration: 13 Jul 2023 → 14 Jul 2023 |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
ISSN (Print) | 0736-587X |
Conference
Conference | 12th Joint Conference on Lexical and Computational Semantics, StarSEM 2023, co-located with ACL 2023 |
---|---|
Country/Territory | Canada |
City | Toronto |
Period | 13/07/23 → 14/07/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
Funding
This research is supported in part by by the National Science Foundation (NSF) under grants IIS2203097 and IIS2125201. This research is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via the HIATUS Program contract #2022-22072200004. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of ODNI, IARPA, or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation therein.
Funders | Funder number |
---|---|
National Science Foundation | IIS2203097, IIS2125201 |
Office of the Director of National Intelligence | |
Intelligence Advanced Research Projects Activity | 2022-22072200004 |