TY - UNPB
T1 - Multi-Document Keyphrase Extraction: Dataset, Baselines and Review
AU - Shapira, Ori
AU - Pasunuru, Ramakanth
AU - Dagan, Ido
AU - Amsterdamer, Yael
PY - 2021/10/3
Y1 - 2021/10/3
N2 - Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task.
AB - Keyphrase extraction has been extensively researched within the single-document setting, with an abundance of methods, datasets and applications. In contrast, multi-document keyphrase extraction has been infrequently studied, despite its utility for describing sets of documents, and its use in summarization. Moreover, no prior dataset exists for multi-document keyphrase extraction, hindering the progress of the task. Recent advances in multi-text processing make the task an even more appealing challenge to pursue. To stimulate this pursuit, we present here the first dataset for the task, MK-DUC-01, which can serve as a new benchmark, and test multiple keyphrase extraction baselines on our data. In addition, we provide a brief, yet comprehensive, literature review of the task.
KW - Computation and Language (cs.CL)
KW - FOS: Computer and information sciences
U2 - 10.48550/ARXIV.2110.01073
DO - 10.48550/ARXIV.2110.01073
M3 - פרסום מוקדם
BT - Multi-Document Keyphrase Extraction: Dataset, Baselines and Review
PB - arXiv preprint arXiv:1508.02374
ER -