Controlled Text Reduction

Aviv Slobodkin, Paul Roit, Eran Hirsch, Ori Ernst, Ido Dagan

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations

Abstract

Producing a reduced version of a source text, as in generic or focused summarization, inherently involves two distinct subtasks: deciding on targeted content and generating a coherent text conveying it. While some popular approaches address summarization as a single end-to-end task, prominent works support decomposed modeling for individual subtasks. Further, semi-automated text reduction is also very appealing, where users may identify targeted content while models would generate a corresponding coherent summary. In this paper, we focus on the second subtask, of generating coherent text given pre-selected content. Concretely, we formalize Controlled Text Reduction as a standalone task, whose input is a source text with marked spans of targeted content ("highlighting"). A model then needs to generate a coherent text that includes all and only the target information. We advocate the potential of such models, both for modular fully-automatic summarization, as well as for semi-automated human-in-the-loop use cases. Facilitating proper research, we crowd-source high-quality dev and test datasets for the task. Further, we automatically generate a larger "silver" training dataset from available summarization benchmarks, leveraging a pre-trained summary-source alignment model. Finally, employing these datasets, we present a supervised baseline model, showing promising results and insightful analyses.

Original languageEnglish
Pages5699-5715
Number of pages17
StatePublished - 2022
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/2211/12/22

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Funding

This work was supported by Intel Labs, the Israel Science Foundation (grant no. 2827/21), and a grant from the Israel Ministry of Science and Technology.

FundersFunder number
Intel Labs
Israel Science Foundation2827/21
Ministry of science and technology, Israel

    Fingerprint

    Dive into the research topics of 'Controlled Text Reduction'. Together they form a unique fingerprint.

    Cite this