Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization

Avital Bross, Sharon Gannot

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

In this paper we propose a data-driven approach for multiple speaker tracking in reverberant enclosures. The speakers are uttering, possibly overlapping, speech signals while moving in the environment. The method comprises two stages. The first stage executes a single source localization using semi-supervised learning on multiple manifolds. The second stage, which is unsupervised, uses time-varying maximum likelihood estimation for tracking. The feature vectors, used by both stages, are the relative transfer functions (RTFs), which are known to be related to source positions. The number of sources is assumed to be known while the microphone positions are unknown. In the training stage, a large database of RTFs is given. A small percentage of the data is attributed with exact positions (namely, labelled data) and the rest is assumed to be unlabelled, i.e. the respective position is unknown. Then, a nonlinear, manifold-based, mapping function between the RTFs and the source positions is inferred. Applying this mapping function to all unlabelled RTFs constructs a dense grid of localized sources. In the test phase, this RTF grid serves as the centroids for a Mixture of Gaussians (MoG) model. The MoG parameters are estimated by applying a recursive variant of the expectation-maximization (EM) procedure that relies on the sparsity and intermittency of the speech signals. We present a comprehensive simulation study in various reverberation levels, including static and dynamic scenarios, for both two or three (partially) overlapping speakers. For the dynamic case we provide simulations with several speakers trajectories, including intersecting sources. The proposed scheme outperforms baseline methods that use a simpler propagation model in terms of localization accuracy and tracking capabilities.

Original languageEnglish
Pages (from-to)1124-1140
Number of pages17
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume31
DOIs
StatePublished - 2023

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

Funding

The work of Avital Bross was supported by the Advancement of Women in Science and Technology of the Israeli Ministry of Science and Technology. This work was supported by the European Union’s Horizon 2020 Research and Innovation Programme, under Grant 871245

FundersFunder number
Horizon 2020 Framework Programme871245
Ministry of science and technology, Israel

    Keywords

    • Manifold learning
    • multiple source tracking
    • recursive expectation-maximization
    • speech sparsity

    Fingerprint

    Dive into the research topics of 'Training-Based Multiple Source Tracking Using Manifold-Learning and Recursive Expectation-Maximization'. Together they form a unique fingerprint.

    Cite this