Semi-Supervised Source Localization in Reverberant Environments with Deep Generative Modeling

Michael J. Bianco, Sharon Gannot, Efren Fernandez-Grande, Peter Gerstoft

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

Localization in reverberant environments remains an open challenge. Recently, supervised learning approaches have demonstrated very promising results in addressing reverberation. However, even with large data volumes, the number of labels available for supervised learning in such environments is usually small. We propose to address this issue with a semi-supervised learning (SSL) approach, based on deep generative modeling. Our chosen deep generative model, the variational autoencoder (VAE), is trained to generate the phase of relative transfer functions (RTFs) between microphones. In parallel, a direction of arrival (DOA) classifier network based on RTF-phase is also trained. The joint generative and discriminative model, deemed VAE-SSL, is trained using labeled and unlabeled RTF-phase sequences. In learning to generate and classify the sequences, the VAE-SSL extracts the physical causes of the RTF-phase (i.e., source location) from distracting signal characteristics such as noise and speech activity. This facilitates effective end-to-end operation of the VAE-SSL, which requires minimal preprocessing of RTF-phase. VAE-SSL is compared with two signal processing-based approaches, steered response power with phase transform (SRP-PHAT) and MUltiple SIgnal Classification (MUSIC), as well as fully supervised CNNs. The approaches are compared using data from two real acoustic environments - one of which was recently obtained at Technical University of Denmark specifically for our study. We find that VAE-SSL can outperform the conventional approaches and the CNN in label-limited scenarios. Further, the trained VAE-SSL system can generate new RTF-phase samples which capture the physics of the acoustic environment. Thus, the generative modeling in VAE-SSL provides a means of interpreting the learned representations. To the best of our knowledge, this paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling.

Original languageEnglish
Article number9449880
Pages (from-to)84956-84970
Number of pages15
JournalIEEE Access
Volume9
DOIs
StatePublished - 2021

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Funding

This work was supported in part by the Office of Naval Research under Grant N00014-11-1-0439, and in part by the European Union’s Horizon 2020 Research and Innovation Program under Agreement 871245. This work was supported in part by the Office of Naval Research under Grant N00014-11-1-0439, and in part by the European Union's Horizon 2020 Research and Innovation Program under Agreement 871245.

FundersFunder number
Office of Naval ResearchN00014-11-1-0439
Horizon 2020 Framework Programme
Horizon 2020871245

    Keywords

    • Source localization
    • deep learning
    • generative modeling
    • semi-supervised learning

    Fingerprint

    Dive into the research topics of 'Semi-Supervised Source Localization in Reverberant Environments with Deep Generative Modeling'. Together they form a unique fingerprint.

    Cite this