The problem of speaker tracking in noisy and reverberant enclosures is addressed in this paper. We present a hybrid algorithm, combining traditional tracking schemes with a new learning-based approach. A state-space representation, consisting of a propagation and observation models, is learned from signals measured by several distributed microphone pairs. The proposed representation is based on two data modalities corresponding to high-dimensional acoustic features representing the full reverberant acoustic channels as well as low-dimensional time difference of arrival (TDOA) estimates. The state-space representation is accompanied by a statistical model based on a Gaussian process used to relate the variations of the acoustic channels to the physical variations of the associated source positions, thereby forming a data-driven propagation model for the source movement. In the observation model, the source positions are nonlinearly mapped to the associated TDOA readings. The obtained propagation and observation models establish the basis for employing an extended Kalman filter. The simulation results demonstrate the robustness of the proposed method in noisy and reverberant conditions.
|Number of pages
|IEEE/ACM Transactions on Audio Speech and Language Processing
|Published - Apr 2018
Bibliographical notePublisher Copyright:
© 2014 IEEE.
- Gaussian process
- Speaker tracking
- extended Kalman filter (EKF)
- relative transfer function (RTF)
- time difference of arrival (TDOA)