Distant Supervision for Keyphrase Extraction using Search Queries

Oren Sar Shalom, Hezi Resheff, Alex Zhicharevich, Rami Cohen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Keyphrase extraction aims at automatically selecting small set of phrases in a document, that best describe its main ideas. There is great need for better methods of keyphrase extraction in the absence of labeled data, as currently unsupervised algorithms fail to achieve adequate performance, compared to their supervised counterparts. In this paper we suggest a widely applicable distant supervision framework based on auxiliary data from query logs. By propagating information from queries and subsequent consumption of content, weak labels are produced, transforming the problem into the easier supervised task. Evaluation on a large dataset shows the superiority of this approach over unsupervised alternatives.

Original languageEnglish
Title of host publicationProceedings - 2020 IEEE 6th International Conference on Big Data Computing Service and Applications, BigDataService 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages70-77
Number of pages8
ISBN (Electronic)9781728170220
DOIs
StatePublished - Aug 2020
Externally publishedYes
Event6th IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2020 - Oxford, United Kingdom
Duration: 3 Aug 20206 Aug 2020

Publication series

NameProceedings - 2020 IEEE 6th International Conference on Big Data Computing Service and Applications, BigDataService 2020

Conference

Conference6th IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2020
Country/TerritoryUnited Kingdom
CityOxford
Period3/08/206/08/20

Bibliographical note

Publisher Copyright:
© 2020 IEEE.

Keywords

  • Document Analysis
  • Keyphrase Extraction
  • Knowledge Extraction

Fingerprint

Dive into the research topics of 'Distant Supervision for Keyphrase Extraction using Search Queries'. Together they form a unique fingerprint.

Cite this