Attention-based neural network for joint diarization and speaker extraction

Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

Multi-microphone, DNN-based, speech enhancement and speaker separation/extraction algorithms have recently gained increasing popularity. The enhancement capabilities of spatial processor can be very high, provided that all its building blocks are accurately estimated. Data-driven estimation approaches can be very attractive since they do not rely on accurate statistical models, which is usually unavailable. However, training a DNN with multi-microphone data is a challenging task, due to inevitable differences between the train and test phases. In this work, we present an estimation procedure for controlling a linearly-constrained minimum variance (LCMV) beamformer for speaker extraction and noise reduction. We propose an attention-based DNN for speaker diarization that is applicable to the task at hand. In the proposed scheme, each microphone signal propagates through a dedicated DNN and an attention mechanism selects the most informative microphone. This approach has the potential of mitigating the mismatch between training and test phases and can therefore lead to an improved speaker extraction performance.

Original languageEnglish
Title of host publication16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages301-305
Number of pages5
ISBN (Electronic)9781538681510
DOIs
StatePublished - 2 Nov 2018
Event16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Tokyo, Japan
Duration: 17 Sep 201820 Sep 2018

Publication series

Name16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings

Conference

Conference16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018
Country/TerritoryJapan
CityTokyo
Period17/09/1820/09/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Keywords

  • Attention-based deep-learning
  • Linearly-constrained minimum variance (LCMV) beamformer
  • Speaker extraction

Fingerprint

Dive into the research topics of 'Attention-based neural network for joint diarization and speaker extraction'. Together they form a unique fingerprint.

Cite this