Multiple speaker localization using mixture of Gaussian model with manifold-based centroids

Avital Bross, Bracha Laufer-Goldshtein, Sharon Gannot

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

A data-driven approach for multiple speakers localization in reverberant enclosures is presented. The approach combines semi-supervised learning on multiple manifolds with unsupervised maximum likelihood estimation. The relative transfer functions (RTFs) are used in both stages of the proposed algorithm as feature vectors, which are known to be related to source positions. The microphone positions are not known. In the training stage, a nonlinear, manifold-based, mapping between RTFs and source locations is inferred using single-speaker utterances. The inference procedure utilizes two RTF datasets: A small set of RTFs with their associated position labels; and a large set of unlabelled RTFs. This mapping is used to generate a dense grid of localized sources that serve as the centroids of a Mixture of Gaussians (MoG) model, used in the test stage of the algorithm to cluster RTFs extracted from multiple-speakers utterances. Clustering is applied by applying the expectation-maximization (EM) procedure that relies on the sparsity and intermittency of the speech signals. A preliminary experimental study, with either two or three overlapping speakers in various reverberation levels, demonstrates that the proposed scheme achieves high localization accuracy compared to a baseline method using a simpler propagation model.

Original languageEnglish
Title of host publication28th European Signal Processing Conference, EUSIPCO 2020 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages895-899
Number of pages5
ISBN (Electronic)9789082797053
DOIs
StatePublished - 24 Jan 2021
Event28th European Signal Processing Conference, EUSIPCO 2020 - Amsterdam, Netherlands
Duration: 24 Aug 202028 Aug 2020

Publication series

NameEuropean Signal Processing Conference
Volume2021-January
ISSN (Print)2219-5491

Conference

Conference28th European Signal Processing Conference, EUSIPCO 2020
Country/TerritoryNetherlands
CityAmsterdam
Period24/08/2028/08/20

Bibliographical note

Funding Information:
This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245; and the Israeli Innovation Authority through KAMIN Project No. 61916. Avital Bross is also funded by grant for advancement of woman in science and technology of the Israeli Ministry of Science and Technology.

Publisher Copyright:
© 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.

Keywords

  • Manifold-learning
  • Mixture of Gaussians
  • Semi-supervised inference

Fingerprint

Dive into the research topics of 'Multiple speaker localization using mixture of Gaussian model with manifold-based centroids'. Together they form a unique fingerprint.

Cite this