How “Multi” is Multi-Document Summarization?

Ruben Wolhandler, Arie Cattan, Ori Ernst, Ido Dagan

Research output: Contribution to conferencePaperpeer-review

12 Scopus citations

Abstract

The task of multi-document summarization (MDS) aims at models that, given multiple documents as input, are able to generate a summary that combines disperse information, originally spread across these documents. Accordingly, it is expected that both reference summaries in MDS datasets, as well as system summaries, would indeed be based on such dispersed information. In this paper, we argue for quantifying and assessing this expectation. To that end, we propose an automated measure for evaluating the degree to which a summary is “disperse”, in the sense of the number of source documents needed to cover its content. We apply our measure to empirically analyze several popular MDS datasets, with respect to their reference summaries, as well as the output of state-of-the-art systems. Our results show that certain MDS datasets barely require combining information from multiple documents, where a single document often covers the full summary content. Overall, we advocate using our metric for assessing and improving the degree to which summarization datasets require combining multi-document information, and similarly how summarization models actually meet this challenge.

Original languageEnglish
Pages5761-5769
Number of pages9
StatePublished - 2022
Event2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Conference

Conference2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/2211/12/22

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Funding

We thank the anonymous reviewers for their insightful comments. This work was supported in part by Intel Labs, the Israel Science Foundation grant 2827/21 and by a grant from the Israel Ministry of Science and Technology. Arie Cattan is partially supported by a fellowship for excellence in data science from the Bar-Ilan Data Science Institute (funded by the Israeli PBC).

FundersFunder number
Bar-Ilan data science institute
Intel Labs
Israel Science Foundation2827/21
Ministry of science and technology, Israel

    Fingerprint

    Dive into the research topics of 'How “Multi” is Multi-Document Summarization?'. Together they form a unique fingerprint.

    Cite this