TY - JOUR
T1 - A Bayesian Hierarchical Model for Speech Enhancement with Time-Varying Audio Channel
AU - Laufer, Yaron
AU - Gannot, Sharon
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2019/1
Y1 - 2019/1
N2 - We present a fully Bayesian hierarchical approach for multichannel speech enhancement with time-varying audio channel. Our probabilistic approach relies on a Gaussian prior for the speech signal and a Gamma hyperprior for the speech precision, combined with a multichannel linear-Gaussian state-space model for the acoustic channel. Furthermore, we assume a Wishart prior for the noise precision matrix. We derive a variational expectation-maximization (VEM) algorithm that uses a variant of a multichannel Wiener filter (MCWF) to infer the sound source and a Kalman smoother to infer the acoustic channel. It is further shown that the VEM speech estimator can be recasted as a multichannel minimum variance distortionless response (MVDR) beamformer followed by a single-channel variational postfilter. The proposed algorithm was evaluated using both simulated and real room environments with several noise types and reverberation levels. Both static and dynamic scenarios are considered. In terms of speech quality, it is shown that a significant improvement is obtained with respect to the noisy signal, and that the proposed method outperforms a baseline algorithm. In terms of channel alignment and tracking ability, a superior channel estimate is demonstrated.
AB - We present a fully Bayesian hierarchical approach for multichannel speech enhancement with time-varying audio channel. Our probabilistic approach relies on a Gaussian prior for the speech signal and a Gamma hyperprior for the speech precision, combined with a multichannel linear-Gaussian state-space model for the acoustic channel. Furthermore, we assume a Wishart prior for the noise precision matrix. We derive a variational expectation-maximization (VEM) algorithm that uses a variant of a multichannel Wiener filter (MCWF) to infer the sound source and a Kalman smoother to infer the acoustic channel. It is further shown that the VEM speech estimator can be recasted as a multichannel minimum variance distortionless response (MVDR) beamformer followed by a single-channel variational postfilter. The proposed algorithm was evaluated using both simulated and real room environments with several noise types and reverberation levels. Both static and dynamic scenarios are considered. In terms of speech quality, it is shown that a significant improvement is obtained with respect to the noisy signal, and that the proposed method outperforms a baseline algorithm. In terms of channel alignment and tracking ability, a superior channel estimate is demonstrated.
KW - Adaptive beamforming
KW - Kalman smoother
KW - variational EM
UR - http://www.scopus.com/inward/record.url?scp=85055048659&partnerID=8YFLogxK
U2 - 10.1109/taslp.2018.2876177
DO - 10.1109/taslp.2018.2876177
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85055048659
SN - 2329-9290
VL - 27
SP - 225
EP - 239
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
IS - 1
M1 - 8492427
ER -