Abstract
In this paper, we present a multi-microphone speech separation algorithm based on masking inferred from the speakers direction of arrival (DOA). According to the W-disjoint orthogonality property of speech signals, each time-frequency (TF) bin is dominated by a single speaker. This TF bin can therefore be associated with a single DOA. In our procedure, we apply a deep neural network (DNN) with a U-net architecture to infer the DOA of each TF bin from a concatenated set of the spectra of the microphone signals. Separation is obtained by multiplying the reference microphone by the masks associated with the different DOAs. Our proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods. Unlike already established methods that apply the clustering in a latent embedded space, in our approach the embedding is closely associated with the spatial information, as manifested by the different speakers' directions of arrival.
Original language | English |
---|---|
Title of host publication | EUSIPCO 2019 - 27th European Signal Processing Conference |
Publisher | European Signal Processing Conference, EUSIPCO |
ISBN (Electronic) | 9789082797039 |
DOIs | |
State | Published - Sep 2019 |
Event | 27th European Signal Processing Conference, EUSIPCO 2019 - A Coruna, Spain Duration: 2 Sep 2019 → 6 Sep 2019 |
Publication series
Name | European Signal Processing Conference |
---|---|
Volume | 2019-September |
ISSN (Print) | 2219-5491 |
Conference
Conference | 27th European Signal Processing Conference, EUSIPCO 2019 |
---|---|
Country/Territory | Spain |
City | A Coruna |
Period | 2/09/19 → 6/09/19 |
Bibliographical note
Publisher Copyright:© 2019,IEEE