Abstract
The problem of speaker tracking in noisy and reverberant enclosures is addressed in this paper. We present a hybrid algorithm, combining traditional tracking schemes with a new learning-based approach. A state-space representation, consisting of a propagation and observation models, is learned from signals measured by several distributed microphone pairs. The proposed representation is based on two data modalities corresponding to high-dimensional acoustic features representing the full reverberant acoustic channels as well as low-dimensional time difference of arrival (TDOA) estimates. The state-space representation is accompanied by a statistical model based on a Gaussian process used to relate the variations of the acoustic channels to the physical variations of the associated source positions, thereby forming a data-driven propagation model for the source movement. In the observation model, the source positions are nonlinearly mapped to the associated TDOA readings. The obtained propagation and observation models establish the basis for employing an extended Kalman filter. The simulation results demonstrate the robustness of the proposed method in noisy and reverberant conditions.
Original language | English |
---|---|
Article number | 8248766 |
Pages (from-to) | 725-735 |
Number of pages | 11 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 26 |
Issue number | 4 |
DOIs | |
State | Published - Apr 2018 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
Funding
Manuscript received May 20, 2017; revised December 5, 2017 and December 25, 2017; accepted December 28, 2017. Date of publication January 8, 2018; date of current version February 1, 2018. This work was supported by a grant from a joint Lower Saxony-Israeli Project financially supported by the State of Lower Saxony. The work of B. Laufer-Goldshtein was supported by the Adams Foundation of the Israel Academy of Sciences and Humanities. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Augusto Sarti. (Corresponding author: Sharon Gannot.) B. Laufer-Goldshtein and S. Gannot are with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: Bracha.Laufer@biu. ac.il; [email protected]).
Funders | Funder number |
---|---|
Adams Foundation of the Israel Academy of Sciences and Humanities | |
State of Lower Saxony |
Keywords
- Gaussian process
- Speaker tracking
- extended Kalman filter (EKF)
- relative transfer function (RTF)
- time difference of arrival (TDOA)