Multimodal deception detection using automatically extracted acoustic, visual, and lexical features

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

23 Scopus citations

Abstract

Deception detection in conversational dialogue has attracted much attention in recent years. Yet existing methods for this rely heavily on human-labeled annotations that are costly and potentially inaccurate. In this work, we present an automated system that utilizes multimodal features for conversational deception detection, without the use of human annotations. We study the predictive power of different modalities and combine them for better performance. We use openSMILE to extract acoustic features after applying noise reduction techniques to the original audio. Facial landmark features are extracted from the visual modality. We experiment with training facial expression detectors and applying Fisher Vectors to encode sequences of facial landmarks with varying length. Linguistic features are extracted from automatic transcriptions of the data. We examine the performance of these methods on the Box of Lies dataset of deception game videos, achieving 73% accuracy using features from all modalities. This result is significantly better than previous results on this corpus which relied on manual annotations, and also better than human performance.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages359-363
Number of pages5
ISBN (Print)9781713820697
DOIs
StatePublished - 2020
Externally publishedYes
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Bibliographical note

Publisher Copyright:
© 2020 ISCA

Keywords

  • Deception
  • Facial landmarks
  • Multimodal data
  • Prosody

Fingerprint

Dive into the research topics of 'Multimodal deception detection using automatically extracted acoustic, visual, and lexical features'. Together they form a unique fingerprint.

Cite this