Abstract
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.
Original language | English |
---|---|
Title of host publication | CoNLL 2021 - 25th Conference on Computational Natural Language Learning, Proceedings |
Editors | Arianna Bisazza, Omri Abend |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 310-322 |
Number of pages | 13 |
ISBN (Electronic) | 9781955917056 |
State | Published - 2021 |
Event | 25th Conference on Computational Natural Language Learning, CoNLL 2021 - Virtual, Online Duration: 10 Nov 2021 → 11 Nov 2021 |
Publication series
Name | CoNLL 2021 - 25th Conference on Computational Natural Language Learning, Proceedings |
---|
Conference
Conference | 25th Conference on Computational Natural Language Learning, CoNLL 2021 |
---|---|
City | Virtual, Online |
Period | 10/11/21 → 11/11/21 |
Bibliographical note
Publisher Copyright:© 2021 Association for Computational Linguistics.
Funding
We thank the anonymous reviewers for their constructive comments. This work was supported in part by the German Research Foundation through the German-Israeli Project Cooperation (DIP, grant DA 1600/1-1); by the Israel Science Foundation (grant 1951/17); by a grant from the Israel Ministry of Science and Technology; and by grants from Intel Labs. MB and RP were supported by NSF-CAREER Award 1846185 and a Microsoft PhD Fellowship.
Funders | Funder number |
---|---|
DIP | DA 1600/1-1 |
German-Israeli Project Cooperation | |
Intel Labs | |
NSF-CAREER | 1846185 |
Microsoft | |
Deutsche Forschungsgemeinschaft | |
Israel Science Foundation | 1951/17 |
Ministry of science and technology, Israel |