Abstract
Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.
Original language | English |
---|---|
Title of host publication | Long and Short Papers |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 682-687 |
Number of pages | 6 |
ISBN (Electronic) | 9781950737130 |
State | Published - 2019 |
Event | 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 - Minneapolis, United States Duration: 2 Jun 2019 → 7 Jun 2019 |
Publication series
Name | NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference |
---|---|
Volume | 1 |
Conference
Conference | 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 |
---|---|
Country/Territory | United States |
City | Minneapolis |
Period | 2/06/19 → 7/06/19 |
Bibliographical note
Publisher Copyright:© 2019 Association for Computational Linguistics
Funding
We would like to thank the anonymous reviewers for their constructive comments, as well as Ani Nenkova for her helpful remarks. This work was supported in part by the Bloomberg Data Science Research Grant Program; by the German Research Foundation through the German-Israeli Project Cooperation (DIP, grants DA 1600/1-1 and GU 798/17-1); by the BIU Center for Research in Applied Cryptography and Cy-ber Security in conjunction with the Israel National Cyber Bureau in the Prime Minister’s Office; by the Israel Science Foundation (grants 1157/16 and 1951/17); by DARPA Young Faculty Award YFA17-D17AP00022; and by the ArguAna Project GU 798/20-1 (DFG).
Funders | Funder number |
---|---|
DIP | GU 798/17-1, DA 1600/1-1 |
German-Israeli Project Cooperation | |
Defense Advanced Research Projects Agency | GU 798/20-1, YFA17-D17AP00022 |
Deutsche Forschungsgemeinschaft | |
Israel Science Foundation | 1157/16, 1951/17 |