Proposition-Level Clustering for Multi-Document Summarization

Ori Ernst, Avi Caciularu, Ori Shapira, Ramakanth Pasunuru, Mohit Bansal, Jacob Goldberger, Ido Dagan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

26 Scopus citations

Abstract

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition. Particularly, clusters were leveraged to indicate information saliency as well as to avoid redundancy. Such prior methods focused on clustering sentences, even though closely related sentences usually contain also non-aligned parts. In this work, we revisit the clustering approach, grouping together sub-sentential propositions, aiming at more precise information alignment. Specifically, our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster via text fusion. Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets, both in automatic ROUGE scores and human preference.

Original languageEnglish
Title of host publicationNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages1765-1779
Number of pages15
ISBN (Electronic)9781955917711
DOIs
StatePublished - 2022
Event2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022 - Seattle, United States
Duration: 10 Jul 202215 Jul 2022

Publication series

NameNAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference

Conference

Conference2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022
Country/TerritoryUnited States
CitySeattle
Period10/07/2215/07/22

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Funding

The work described herein was supported in part by the PBC fellowship for outstanding PhD candidates in data science, Intel Labs, the Israel Science Foundation grant 2827/21, and by a grant from the Israel Ministry of Science and Technology. Ethical Considerations The work described herein was supported in part by the PBC fellowship for outstanding PhD candidates in data science, Intel Labs, the Israel Science Foundation grant 2827/21, and by a grant from the Israel Ministry of Science and Technology.

FundersFunder number
Ethical Considerations
Intel Labs
Intel Labs
Israel Science Foundation2827/21
Ministry of science and technology, Israel
Planning and Budgeting Committee of the Council for Higher Education of Israel

    Fingerprint

    Dive into the research topics of 'Proposition-Level Clustering for Multi-Document Summarization'. Together they form a unique fingerprint.

    Cite this