Deception Detection Within and Across Domains: Identifying and Understanding the Performance Gap

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

NLP approaches to automatic deception detection have gained popularity over the past few years, especially with the proliferation of fake reviews and fake news online. However, most previous studies of deception detection have focused on single domains. We currently lack information about how these single-domain models of deception may or may not generalize to new domains. In this work, we conduct empirical studies of cross-domain deception detection in five domains to understand how current models perform when evaluated on new deception domains. Our experimental results reveal a large gap between within and across domain classification performance. Motivated by these findings, we propose methods to understand the differences in performances across domains. We formulate five distance metrics that quantify the distance between pairs of deception domains. We experimentally demonstrate that the distance between a pair of domains negatively correlates with the cross-domain accuracies of the domains. We thoroughly analyze the differences in the domains and the impact of fine-tuning BERT based models by visualization of the sentence embeddings. Finally, we utilize the distance metrics to recommend the optimal source domain for any given target domain. This work highlights the need to develop robust learning algorithms for cross-domain deception detection that generalize and adapt to new domains and contributes toward that goal.

Original languageEnglish
Article number7
JournalJournal of Data and Information Quality
Volume15
Issue number1
DOIs
StatePublished - 28 Dec 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 Association for Computing Machinery.

Keywords

  • Deception detection
  • cross-domain classification

Fingerprint

Dive into the research topics of 'Deception Detection Within and Across Domains: Identifying and Understanding the Performance Gap'. Together they form a unique fingerprint.

Cite this