Abstract
This paper addresses the problems of blind multichannel identification and equalization for joint speech dereverberation and noise reduction. The time-domain cross-relation method is hardly applicable for blind room impulse response identification due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse response is approximately represented by the convolutive transfer function (CTF) with much less coefficients. For the oversampled STFT, CTFs suffer from the common zeros caused by the nonflat frequency response of the STFT window. To overcome this, we propose to identify CTFs using the STFT framework with oversampled signals and critically sampled CTFs, which is a good tradeoff between the frequency aliasing of the signals and the common zeros problem of CTFs. The identified complex-valued CTFs are not accurate enough for multichannel equalization due to the frequency aliasing of the CTFs. Hence, we only use the CTF magnitudes, which leads to a nonnegative multichannel equalization method based on a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude. Compared with the complex-valued convolution model, this nonnegative convolution model is shown to be more robust against the CTF perturbations. To recover the STFT magnitude of the source signal and to reduce the additive noise, the ℓ2-norm fitting error between the STFT magnitude of the microphone signals and the nonnegative convolution is constrained to be less than a noise power related tolerance. Meanwhile, the ℓ1-norm of the STFT magnitude of the source signal is minimized to impose the sparsity.
Original language | English |
---|---|
Pages (from-to) | 1755-1768 |
Number of pages | 14 |
Journal | IEEE/ACM Transactions on Audio Speech and Language Processing |
Volume | 26 |
Issue number | 10 |
DOIs | |
State | Published - Oct 2018 |
Bibliographical note
Publisher Copyright:© 2014 IEEE.
Funding
Manuscript received December 22, 2017; revised April 6, 2018; accepted May 11, 2018. Date of publication May 21, 2018; date of current version June 21, 2018. This work was supported by the ERC Advanced Grant VHIA #340113. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Simon Doclo. (Corresponding author: Xiaofei Li.) X. Li and R. Horaud are with the INRIA Grenoble Rhône-Alpes, Montbonnot-Saint-Martin 38334, France (e-mail:,[email protected]; [email protected]).
Funders | Funder number |
---|---|
Seventh Framework Programme | 340113 |
European Commission |
Keywords
- Multichannel identification
- convolutive transfer function
- dereverberation
- multichannel equalization