Abstract
Localization in reverberant environments remains an open challenge. Recently, supervised learning approaches have demonstrated very promising results in addressing reverberation. However, even with large data volumes, the number of labels available for supervised learning in such environments is usually small. We propose to address this issue with a semi-supervised learning (SSL) approach, based on deep generative modeling. Our chosen deep generative model, the variational autoencoder (VAE), is trained to generate the phase of relative transfer functions (RTFs) between microphones. In parallel, a direction of arrival (DOA) classifier network based on RTF-phase is also trained. The joint generative and discriminative model, deemed VAE-SSL, is trained using labeled and unlabeled RTF-phase sequences. In learning to generate and classify the sequences, the VAE-SSL extracts the physical causes of the RTF-phase (i.e., source location) from distracting signal characteristics such as noise and speech activity. This facilitates effective end-to-end operation of the VAE-SSL, which requires minimal preprocessing of RTF-phase. VAE-SSL is compared with two signal processing-based approaches, steered response power with phase transform (SRP-PHAT) and MUltiple SIgnal Classification (MUSIC), as well as fully supervised CNNs. The approaches are compared using data from two real acoustic environments - one of which was recently obtained at Technical University of Denmark specifically for our study. We find that VAE-SSL can outperform the conventional approaches and the CNN in label-limited scenarios. Further, the trained VAE-SSL system can generate new RTF-phase samples which capture the physics of the acoustic environment. Thus, the generative modeling in VAE-SSL provides a means of interpreting the learned representations. To the best of our knowledge, this paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling.
Original language | English |
---|---|
Article number | 9449880 |
Pages (from-to) | 84956-84970 |
Number of pages | 15 |
Journal | IEEE Access |
Volume | 9 |
DOIs | |
State | Published - 2021 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Funding
This work was supported in part by the Office of Naval Research under Grant N00014-11-1-0439, and in part by the European Union’s Horizon 2020 Research and Innovation Program under Agreement 871245. This work was supported in part by the Office of Naval Research under Grant N00014-11-1-0439, and in part by the European Union's Horizon 2020 Research and Innovation Program under Agreement 871245.
Funders | Funder number |
---|---|
Office of Naval Research | N00014-11-1-0439 |
Horizon 2020 Framework Programme | |
Horizon 2020 | 871245 |
Keywords
- Source localization
- deep learning
- generative modeling
- semi-supervised learning