Forward-backward recursive expectation-maximization for concurrent speaker tracking

Yuval Dorfan, Boaz Schwartz, Sharon Gannot

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In this paper, a study addressing the task of tracking multiple concurrent speakers in reverberant conditions is presented. Since both past and future observations can contribute to the current location estimate, we propose a forward-backward approach, which improves tracking accuracy by introducing near-future data to the estimator, in the cost of an additional short latency. Unlike classical target tracking, we apply a non-Bayesian approach, which does not make assumptions with respect to the target trajectories, except for assuming a realistic change in the parameters due to natural behaviour. The proposed method is based on the recursive expectation-maximization (REM) approach. The new method is dubbed forward-backward recursive expectation-maximization (FB-REM). The performance is demonstrated using an experimental study, where the tested scenarios involve both simulated and recorded signals, with typical reverberation levels and multiple moving sources. It is shown that the proposed algorithm outperforms the regular common causal (REM).

Original languageEnglish
Article number2
JournalEurasip Journal on Audio, Speech, and Music Processing
Volume2021
Issue number1
DOIs
StatePublished - Dec 2021

Bibliographical note

Publisher Copyright:
© 2021, The Author(s).

Funding

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245.

FundersFunder number
Horizon 2020 Framework Programme871245

    Keywords

    • Forward-backward
    • Microphone arrays
    • Recursive expectation-maximization
    • Simultaneous speakers
    • Sound source tracking
    • W-disjoint orthogonality

    Fingerprint

    Dive into the research topics of 'Forward-backward recursive expectation-maximization for concurrent speaker tracking'. Together they form a unique fingerprint.

    Cite this