A Bayesian Hierarchical Model for Speech Enhancement with Time-Varying Audio Channel

Yaron Laufer, Sharon Gannot

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

We present a fully Bayesian hierarchical approach for multichannel speech enhancement with time-varying audio channel. Our probabilistic approach relies on a Gaussian prior for the speech signal and a Gamma hyperprior for the speech precision, combined with a multichannel linear-Gaussian state-space model for the acoustic channel. Furthermore, we assume a Wishart prior for the noise precision matrix. We derive a variational expectation-maximization (VEM) algorithm that uses a variant of a multichannel Wiener filter (MCWF) to infer the sound source and a Kalman smoother to infer the acoustic channel. It is further shown that the VEM speech estimator can be recasted as a multichannel minimum variance distortionless response (MVDR) beamformer followed by a single-channel variational postfilter. The proposed algorithm was evaluated using both simulated and real room environments with several noise types and reverberation levels. Both static and dynamic scenarios are considered. In terms of speech quality, it is shown that a significant improvement is obtained with respect to the noisy signal, and that the proposed method outperforms a baseline algorithm. In terms of channel alignment and tracking ability, a superior channel estimate is demonstrated.

Original languageEnglish
Article number8492427
Pages (from-to)225-239
Number of pages15
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume27
Issue number1
DOIs
StatePublished - Jan 2019

Bibliographical note

Publisher Copyright:
© 2014 IEEE.

Keywords

  • Adaptive beamforming
  • Kalman smoother
  • variational EM

Fingerprint

Dive into the research topics of 'A Bayesian Hierarchical Model for Speech Enhancement with Time-Varying Audio Channel'. Together they form a unique fingerprint.

Cite this