Multi-microphone speaker separation based on deep DOA estimation

Shlomo E. Chazan, Hodaya Hammer, Gershon Hazan, Jacob Goldberger, Sharon Gannot

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

40 Scopus citations

Abstract

In this paper, we present a multi-microphone speech separation algorithm based on masking inferred from the speakers direction of arrival (DOA). According to the W-disjoint orthogonality property of speech signals, each time-frequency (TF) bin is dominated by a single speaker. This TF bin can therefore be associated with a single DOA. In our procedure, we apply a deep neural network (DNN) with a U-net architecture to infer the DOA of each TF bin from a concatenated set of the spectra of the microphone signals. Separation is obtained by multiplying the reference microphone by the masks associated with the different DOAs. Our proposed deep direction estimation for speech separation (DDESS) method is inspired by the recent advances in deep clustering methods. Unlike already established methods that apply the clustering in a latent embedded space, in our approach the embedding is closely associated with the spatial information, as manifested by the different speakers' directions of arrival.

Original languageEnglish
Title of host publicationEUSIPCO 2019 - 27th European Signal Processing Conference
PublisherEuropean Signal Processing Conference, EUSIPCO
ISBN (Electronic)9789082797039
DOIs
StatePublished - Sep 2019
Event27th European Signal Processing Conference, EUSIPCO 2019 - A Coruna, Spain
Duration: 2 Sep 20196 Sep 2019

Publication series

NameEuropean Signal Processing Conference
Volume2019-September
ISSN (Print)2219-5491

Conference

Conference27th European Signal Processing Conference, EUSIPCO 2019
Country/TerritorySpain
CityA Coruna
Period2/09/196/09/19

Bibliographical note

Publisher Copyright:
© 2019,IEEE

Fingerprint

Dive into the research topics of 'Multi-microphone speaker separation based on deep DOA estimation'. Together they form a unique fingerprint.

Cite this