Abstract
Community question answering (CQA) sites are quickly becoming an invaluable source of information in many domains. Since CQA forums are based on the contributions of many authors, the problem of finding similar or even duplicate questions is essential. In the absence of supervised data for this problem, we propose a novel approach to generate weak labels based on easily obtainable data that exist in most CQAs, e.g., query logs and references in the answers. These labels accommodate training of auxiliary supervised text classification models. The internal states of these models serve as meaningful question representations and are used for semantic similarity. We demonstrate that these methods are superior to state of the art text embedding methods for the question similarity task.
Original language | English |
---|---|
Title of host publication | ICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods |
Editors | Maria De Marsico, Gabriella Sanniti di Baja, Ana Fred |
Publisher | SciTePress |
Pages | 342-352 |
Number of pages | 11 |
ISBN (Electronic) | 9789897583971 |
State | Published - 2020 |
Externally published | Yes |
Event | 9th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2020 - Valletta, Malta Duration: 22 Feb 2020 → 24 Feb 2020 |
Publication series
Name | ICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods |
---|
Conference
Conference | 9th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2020 |
---|---|
Country/Territory | Malta |
City | Valletta |
Period | 22/02/20 → 24/02/20 |
Bibliographical note
Publisher Copyright:Copyright © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved.
Keywords
- Community Question Answering
- Deep Learning
- Text Representation
- Text Similarity
- Weak Supervision