Multichannel speech separation and enhancement using the convolutive transfer function

Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, assuming known mixing filters. We propose to perform speech separation and enhancement in the short-time Fourier transform domain using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, the CTF has much less taps. Consequently, it requires less computational cost and sometimes is more robust against the filter perturbations. We propose three methods: 1) for the multisource case, the multichannel inverse filtering method, i.e., the multiple input/output inverse theorem (MINT), is exploited in the CTF domain; 2) a beamforming-like multichannel inverse filtering method applying the single-source MINT and using power minimization, which is suitable whenever the source CTFs are not all known; and 3) a basis pursuit method, where the sources are recovered by minimizing their ℓ 1 -norm to impose spectral sparsity, while the ℓ 2 -norm fitting cost between microphone signals and mixing model is constrained to be lower than a tolerance. The noise can be reduced by setting this tolerance at the noise power level. Experiments under various acoustic conditions are carried out to evaluate and compare the three proposed methods. Comparison with four baseline methods - beamforming-based, two time-domain inverse filters, and time-domain Lasso - shows the applicability of the proposed methods.

Original languageEnglish
Article number8610134
Pages (from-to)645-659
Number of pages15
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume27
Issue number3
DOIs
StatePublished - Mar 2019

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

Keywords

  • Audio source separation
  • Lasso optimization
  • MINT
  • convolutive transfer function
  • short-time Fourier transform
  • speech enhancement

Fingerprint

Dive into the research topics of 'Multichannel speech separation and enhancement using the convolutive transfer function'. Together they form a unique fingerprint.

Cite this