Abstract
Application of the linearly constrained minimum variance (LCMV) beamformer (BF) to speaker extraction tasks in real-life scenarios necessitates a sophisticated control mechanism to facilitate the estimation of the noise spatial cross-power spectral density (cPSD) matrix and the relative transfer function (RTF) of all sources of interest. We propose a deep neural network (DNN)-based multichannel concurrent speakers detector (MCCSD) that utilizes all available microphone signals to detect the activity patterns of all speakers. Time frames classified as no active speaker frames will be utilized to estimate the cPSD, while time frames with a single detected speaker will be utilized for estimating the associated RTF. No estimation will take place during concurrent speaker activity. Experimental results show that the multi-channel approach significantly improves its single-channel counterpart.
Original language | English |
---|---|
Title of host publication | 2018 26th European Signal Processing Conference, EUSIPCO 2018 |
Publisher | European Signal Processing Conference, EUSIPCO |
Pages | 1562-1566 |
Number of pages | 5 |
ISBN (Electronic) | 9789082797015 |
DOIs | |
State | Published - 29 Nov 2018 |
Event | 26th European Signal Processing Conference, EUSIPCO 2018 - Rome, Italy Duration: 3 Sep 2018 → 7 Sep 2018 |
Publication series
Name | European Signal Processing Conference |
---|---|
Volume | 2018-September |
ISSN (Print) | 2219-5491 |
Conference
Conference | 26th European Signal Processing Conference, EUSIPCO 2018 |
---|---|
Country/Territory | Italy |
City | Rome |
Period | 3/09/18 → 7/09/18 |
Bibliographical note
Publisher Copyright:© EURASIP 2018.