Abstract
Deep direction of arrival (DOA) models commonly require a perfect match between the array configurations in the training and test stages and consequently cannot be applied to unfamiliar microphone array constellations. In this paper, we present a deep DOA estimation method that circumvents this requirement. In our approach, we first cast the DOA estimation as a classification problem in each time-frequency (TF) bin, thus facilitating the localization of multiple concurrent speakers. We utilize a high-resolution spatial image, based on a narrow-band variant of the steered response power phase transform (SRP-PHAT) processor, as an input feature. The model is trained with simulated data using a single microphone array configuration in various acoustic conditions. In the test stage, the algorithm is applied with unfamiliar microphone array constellations, namely with a different number of microphones and inter-distances. An elaborated experimental study with real-life room impulse response (RIR) recordings demonstrates the effectiveness of the proposed input feature and the training scheme. Our approach achieves comparable results in familiar microphone array constellations and, more importantly, can accurately estimate the DOA of multiple concurrent speakers even with unfamiliar microphone arrays.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9798350323726 |
DOIs | |
State | Published - 2023 |
Event | 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023 - New Paltz, United States Duration: 22 Oct 2023 → 25 Oct 2023 |
Publication series
Name | IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
---|---|
Volume | 2023-October |
ISSN (Print) | 1931-1168 |
ISSN (Electronic) | 1947-1629 |
Conference
Conference | 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2023 |
---|---|
Country/Territory | United States |
City | New Paltz |
Period | 22/10/23 → 25/10/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Funding
The project has received funding from the European Union's Horizon 2020 Research and Innovation Programme, Grant Agreement No. 871245; and was also supported by the Israeli Ministry of Science & Technology. ∗The project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme, Grant Agreement No. 871245; and was also supported by the Israeli Ministry of Science & Technology.
Funders | Funder number |
---|---|
Horizon 2020 Framework Programme | |
Ministry of science and technology, Israel | |
Horizon 2020 | 871245 |
Keywords
- Deep DOA
- SRP-PHAT
Fingerprint
Dive into the research topics of 'Array Configuration Mismatch in Deep DOA Estimation: Towards Robust Training'. Together they form a unique fingerprint.Equipment
-
Speech & Signal Processing Lab
Gannot, S. (Manager)
Bar-Ilan University - The Alexander Kofkin Faculty of EngineeringEquipment/facility: Facility