This article develops a semi-supervised algorithm to address the challenging multi-source localization problem in a noisy and reverberant environment, using a spherical harmonics domain source feature of the relative harmonic coefficients. We present a comprehensive research of this source feature, including (i) an illustration confirming its sole dependence on the source position, (ii) a feature estimator in the presence of noise, (iii) a feature selector exploiting its inherent directivity over space. Source features at varied spherical harmonic modes, representing unique characterization of the soundfield, are fused by the Multi-Mode Gaussian Process modeling. Based on the unifying model, we then formulate the mapping function revealing the underlying relationship between the source feature(s) and position(s) using a Bayesian inference approach. Another issue of the overlapped components is addressed by a pre-processing technique performing overlapped frame detection, which in turn reduces this challenging problem to a single source localization. It is highlighted that this data-driven method has a strong potential to be implemented in practice because only a limited number of labeled measurements is required. We evaluate this proposed algorithm using simulated recordings between multiple speakers in diverse environments, and extensive results confirm improved performance in comparison with the state-of-art methods. Additional assessments using real-life recordings further prove the effectiveness of the method, even at unfavorable circumstances with severe source overlapping.
|Number of pages||16|
|Journal||IEEE/ACM Transactions on Audio Speech and Language Processing|
|State||Published - 2020|
Bibliographical notePublisher Copyright:
© 2014 IEEE.
- Gaussian Process regression
- multi-mode Gaussian Process
- relative harmonic coefficients
- semi-supervised multiple source localization
- source feature estimator