What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios. We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable, along with commonsense rationales that justify the reasoning. To elicit high-quality task data, we take an iterative self-distillation approach that starts from a small amount of unstructured seed knowledge from GPT-3 and then alternates between (1) self-distillation from student models; (2) targeted filtering with a critic model trained by human judgment (to boost validity) and NLI (to boost diversity); (3) self-imitation learning (to amplify the desired data quality). This process yields a student model that produces defeasible contexts with improved validity, diversity, and defeasibility. From this model we distill a high-quality dataset, δ-RULES-OF-THUMB (δ-ROT), of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85.9% to 99.8% of the time. Using δ-ROT we obtain a final student model that wins over all intermediate student models by a notable margin.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2023
PublisherAssociation for Computational Linguistics (ACL)
Pages12140-12159
Number of pages20
ISBN (Electronic)9798891760615
StatePublished - 2023
Externally publishedYes
Event2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore
Duration: 6 Dec 202310 Dec 2023

Publication series

NameFindings of the Association for Computational Linguistics: EMNLP 2023

Conference

Conference2023 Findings of the Association for Computational Linguistics: EMNLP 2023
Country/TerritorySingapore
CitySingapore
Period6/12/2310/12/23

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Funding

The authors thank the anonymous reviewers. This research was supported in part by DARPA under the ITM program (FA8650-23-C-7316) and the Allen Institute for AI.

FundersFunder number
Defense Advanced Research Projects AgencyFA8650-23-C-7316
ALLEN INSTITUTE

    Fingerprint

    Dive into the research topics of 'What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations'. Together they form a unique fingerprint.

    Cite this