Multichannel speech separation and enhancement using the convolutive transfer function

Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud

Research output: Contribution to journalArticlepeer-review

18 Scopus citations

Abstract

This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, assuming known mixing filters. We propose to perform speech separation and enhancement in the short-time Fourier transform domain using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, the CTF has much less taps. Consequently, it requires less computational cost and sometimes is more robust against the filter perturbations. We propose three methods: 1) for the multisource case, the multichannel inverse filtering method, i.e., the multiple input/output inverse theorem (MINT), is exploited in the CTF domain; 2) a beamforming-like multichannel inverse filtering method applying the single-source MINT and using power minimization, which is suitable whenever the source CTFs are not all known; and 3) a basis pursuit method, where the sources are recovered by minimizing their ℓ 1 -norm to impose spectral sparsity, while the ℓ 2 -norm fitting cost between microphone signals and mixing model is constrained to be lower than a tolerance. The noise can be reduced by setting this tolerance at the noise power level. Experiments under various acoustic conditions are carried out to evaluate and compare the three proposed methods. Comparison with four baseline methods - beamforming-based, two time-domain inverse filters, and time-domain Lasso - shows the applicability of the proposed methods.

Original languageEnglish
Article number8610134
Pages (from-to)645-659
Number of pages15
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume27
Issue number3
DOIs
StatePublished - Mar 2019

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

Funding

Manuscript received February 27, 2018; revised August 10, 2018, October 23, 2018, and December 11, 2018; accepted December 27, 2018. Date of publication January 11, 2019; date of current version January 25, 2019. This work was supported by the European Research Council Advanced Grant 340113 (project: Vision and Hearing in Action). The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Simon Doclo. (Corresponding author: Xiaofei Li.) X. Li and R. Horaud are with the INRIA Grenoble Rhône-Alpes, 38330 Montbonnot-Saint-Martin, France (e-mail:, [email protected]; radu. [email protected]).

FundersFunder number
Seventh Framework Programme340113
European Commission

    Keywords

    • Audio source separation
    • Lasso optimization
    • MINT
    • convolutive transfer function
    • short-time Fourier transform
    • speech enhancement

    Fingerprint

    Dive into the research topics of 'Multichannel speech separation and enhancement using the convolutive transfer function'. Together they form a unique fingerprint.

    Cite this