Learning question similarity in CQA from references and query-logs

Alex Zhicharevich, Moni Shahar, Oren Sar Shalom

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Community question answering (CQA) sites are quickly becoming an invaluable source of information in many domains. Since CQA forums are based on the contributions of many authors, the problem of finding similar or even duplicate questions is essential. In the absence of supervised data for this problem, we propose a novel approach to generate weak labels based on easily obtainable data that exist in most CQAs, e.g., query logs and references in the answers. These labels accommodate training of auxiliary supervised text classification models. The internal states of these models serve as meaningful question representations and are used for semantic similarity. We demonstrate that these methods are superior to state of the art text embedding methods for the question similarity task.

Original languageEnglish
Title of host publicationICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods
EditorsMaria De Marsico, Gabriella Sanniti di Baja, Ana Fred
PublisherSciTePress
Pages342-352
Number of pages11
ISBN (Electronic)9789897583971
StatePublished - 2020
Externally publishedYes
Event9th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2020 - Valletta, Malta
Duration: 22 Feb 202024 Feb 2020

Publication series

NameICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods

Conference

Conference9th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2020
Country/TerritoryMalta
CityValletta
Period22/02/2024/02/20

Bibliographical note

Publisher Copyright:
Copyright © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.

Keywords

  • Community Question Answering
  • Deep Learning
  • Text Representation
  • Text Similarity
  • Weak Supervision

Fingerprint

Dive into the research topics of 'Learning question similarity in CQA from references and query-logs'. Together they form a unique fingerprint.

Cite this