Abstract
In this paper we propose a data-driven approach for multiple speaker tracking in reverberant enclosures. The speakers are uttering, possibly overlapping, speech signals while moving in the environment. The method comprises two stages. The first stage executes a single source localization using semi-supervised learning on multiple manifolds. The second stage, which is unsupervised, uses time-varying maximum likelihood estimation for tracking. The feature vectors, used by both stages, are the relative transfer functions (RTFs), which are known to be related to source positions. The number of sources is assumed to be known while the microphone positions are unknown. In the training stage, a large database of RTFs is given. A small percentage of the data is attributed with exact positions (namely, labelled data) and the rest is assumed to be unlabelled, i.e. the respective position is unknown. Then, a nonlinear, manifold-based, mapping function between the RTFs and the source positions is inferred. Applying this mapping function to all unlabelled RTFs constructs a dense grid of localized sources. In the test phase, this RTF grid serves as the centroids for a Mixture of Gaussians (MoG) model. The MoG parameters are estimated by applying a recursive variant of the expectation-maximization (EM) procedure that relies on the sparsity and intermittency of the speech signals. We present a comprehensive simulation study in various reverberation levels, including static and dynamic scenarios, for both two or three (partially) overlapping speakers. For the dynamic case we provide simulations with several speakers trajectories, including intersecting sources. The proposed scheme outperforms baseline methods that use a simpler propagation model in terms of localization accuracy and tracking capabilities.
| Original language | English |
|---|---|
| Pages (from-to) | 1124-1140 |
| Number of pages | 17 |
| Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
| Volume | 31 |
| DOIs | |
| State | Published - 2023 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
Funding
The work of Avital Bross was supported by the Advancement of Women in Science and Technology of the Israeli Ministry of Science and Technology. This work was supported by the European Union’s Horizon 2020 Research and Innovation Programme, under Grant 871245
| Funders | Funder number |
|---|---|
| Horizon 2020 Framework Programme | 871245 |
| Ministry of science and technology, Israel |
Keywords
- Manifold learning
- multiple source tracking
- recursive expectation-maximization
- speech sparsity