Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization

Xiaofei Li, Laurent Girin, Radu Horaud, Sharon Gannot

Research output: Contribution to journalArticlepeer-review

38 Scopus citations


This paper addresses the problem of multiple-speaker localization in noisy and reverberant environments, using binaural recordings of an acoustic scene. A complex-valued Gaussian mixture model (CGMM) is adopted, whose components correspond to all the possible candidate source locations defined on a grid. After optimizing the CGMM-based objective function, given an observed set of complex-valued binaural features, both the number of sources and their locations are estimated by selecting the CGMM components with the largest weights. An entropy-based penalty term is added to the likelihood to impose sparsity over the set of CGMM component weights. This favors a small number of detected speakers with respect to the large number of initial candidate source locations. In addition, the direct-path relative transfer function (DP-RTF) is used to build robust binaural features. The DP-RTF, recently proposed for single-source localization, encodes interchannel information corresponding to the direct path of sound propagation and is thus robust to reverberations. In this paper, we extend the DP-RTF estimation to the case of multiple sources. In the short-time Fourier transform domain, a consistency test is proposed to check whether a set of consecutive frames is associated with the same source or not. Reliable DP-RTF features are selected from the frames that pass the consistency test to be used for source localization. Experiments carried out using both simulation data and real data recorded with a robotic head confirm the efficiency of the proposed multisource localization method.

Original languageEnglish
Pages (from-to)1997-2012
Number of pages16
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Issue number10
StatePublished - Oct 2017

Bibliographical note

Publisher Copyright:
© 2014 IEEE.


  • Candidate-based GMM
  • direct-path RTF
  • entropy penalty
  • multiple-speaker localization


Dive into the research topics of 'Multiple-Speaker Localization Based on Direct-Path Features and Likelihood Maximization with Spatial Sparsity Regularization'. Together they form a unique fingerprint.

Cite this