Abstract
Multi-microphone, DNN-based, speech enhancement and speaker separation/extraction algorithms have recently gained increasing popularity. The enhancement capabilities of spatial processor can be very high, provided that all its building blocks are accurately estimated. Data-driven estimation approaches can be very attractive since they do not rely on accurate statistical models, which is usually unavailable. However, training a DNN with multi-microphone data is a challenging task, due to inevitable differences between the train and test phases. In this work, we present an estimation procedure for controlling a linearly-constrained minimum variance (LCMV) beamformer for speaker extraction and noise reduction. We propose an attention-based DNN for speaker diarization that is applicable to the task at hand. In the proposed scheme, each microphone signal propagates through a dedicated DNN and an attention mechanism selects the most informative microphone. This approach has the potential of mitigating the mismatch between training and test phases and can therefore lead to an improved speaker extraction performance.
Original language | English |
---|---|
Title of host publication | 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 301-305 |
Number of pages | 5 |
ISBN (Electronic) | 9781538681510 |
DOIs | |
State | Published - 2 Nov 2018 |
Event | 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Tokyo, Japan Duration: 17 Sep 2018 → 20 Sep 2018 |
Publication series
Name | 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 - Proceedings |
---|
Conference
Conference | 16th International Workshop on Acoustic Signal Enhancement, IWAENC 2018 |
---|---|
Country/Territory | Japan |
City | Tokyo |
Period | 17/09/18 → 20/09/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Keywords
- Attention-based deep-learning
- Linearly-constrained minimum variance (LCMV) beamformer
- Speaker extraction