Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks

Martin Wöllmer, Florian Eyben, Joseph Keshet, Alex Graves, Björn Schuller, Gerhard Rigoll

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

52 Scopus citations

Abstract

In this paper we propose a new technique for robust keyword spotting that uses bidirectional Long Short-Term Memory (BLSTM) recurrent neural nets to incorporate contextual information in speech decoding. Our approach overcomes the drawbacks of generative HMM modeling by applying a discriminative learning procedure that non-linearly maps speech features into an abstract vector space. By incorporating the outputs of a BLSTM network into the speech features, it is able to make use of past and future context for phoneme predictions. The robustness of the approach is evaluated on a keyword spotting task using the HUMAINE Sensitive Artificial Listener (SAL) database, which contains accented, spontaneous, and emotionally colored speech. The test is particularly stringent because the system is not trained on the SAL database, but only on the TIMIT corpus of read speech. We show that our method prevails over a discriminative keyword spotter without BLSTM-enhanced feature functions, which in turn has been proven to outperform HMM-based techniques.

Original languageEnglish
Title of host publication2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
Pages3949-3952
Number of pages4
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009 - Taipei, Taiwan, Province of China
Duration: 19 Apr 200924 Apr 2009

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Country/TerritoryTaiwan, Province of China
CityTaipei
Period19/04/0924/04/09

Keywords

  • Recurrent neural networks
  • Robustness
  • Speech recognition

Fingerprint

Dive into the research topics of 'Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks'. Together they form a unique fingerprint.

Cite this