Crowdsourcing lightweight pyramids for manual summary evaluation

Ori Shapira, David Gabay, Yang Gao, Hadar Ronen, Ramakanth Pasunuru, Mohit Bansal, Yael Amsterdamer, Ido Dagan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

41 Scopus citations

Abstract

Conducting a manual evaluation is considered an essential part of summary evaluation methodology. Traditionally, the Pyramid protocol, which exhaustively compares system summaries to references, has been perceived as very reliable, providing objective scores. Yet, due to the high cost of the Pyramid method and the required expertise, researchers resorted to cheaper and less thorough manual evaluation methods, such as Responsiveness and pairwise comparison, attainable via crowdsourcing. We revisit the Pyramid approach, proposing a lightweight sampling-based version that is crowdsourcable. We analyze the performance of our method in comparison to original expert-based Pyramid evaluations, showing higher correlation relative to the common Responsiveness method. We release our crowdsourced Summary-Content-Units, along with all crowdsourcing scripts, for future evaluations.

Original languageEnglish
Title of host publicationLong and Short Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages682-687
Number of pages6
ISBN (Electronic)9781950737130
StatePublished - 2019
Event2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019 - Minneapolis, United States
Duration: 2 Jun 20197 Jun 2019

Publication series

NameNAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume1

Conference

Conference2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2019
Country/TerritoryUnited States
CityMinneapolis
Period2/06/197/06/19

Bibliographical note

Publisher Copyright:
© 2019 Association for Computational Linguistics

Funding

We would like to thank the anonymous reviewers for their constructive comments, as well as Ani Nenkova for her helpful remarks. This work was supported in part by the Bloomberg Data Science Research Grant Program; by the German Research Foundation through the German-Israeli Project Cooperation (DIP, grants DA 1600/1-1 and GU 798/17-1); by the BIU Center for Research in Applied Cryptography and Cy-ber Security in conjunction with the Israel National Cyber Bureau in the Prime Minister’s Office; by the Israel Science Foundation (grants 1157/16 and 1951/17); by DARPA Young Faculty Award YFA17-D17AP00022; and by the ArguAna Project GU 798/20-1 (DFG).

FundersFunder number
DIPGU 798/17-1, DA 1600/1-1
German-Israeli Project Cooperation
Defense Advanced Research Projects AgencyGU 798/20-1, YFA17-D17AP00022
Deutsche Forschungsgemeinschaft
Israel Science Foundation1157/16, 1951/17

    Fingerprint

    Dive into the research topics of 'Crowdsourcing lightweight pyramids for manual summary evaluation'. Together they form a unique fingerprint.

    Cite this