A two-stage speaker extraction algorithm under adverse acoustic conditions using a single-microphone

Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this work, we present a two-stage method for speaker extraction under reverberant and noisy conditions. Given a reference signal of the desired speaker, the clean, but the still reverberant desired speaker is first extracted from the noisy-mixed sign'al. In the second stage, the extracted signal is further enhanced by joint dereverberation and residual noise and interference reduction. The proposed architecture comprises two sub-networks, one for the extraction task and the second for the dereverberation task. We present a training strategy for this architecture and show that the performance of the proposed method is on par with other state-of-the-art (SOTA) methods when applied to the WHAMR! dataset. Furthermore, we present a new dataset with more realistic adverse acoustic conditions and show that our method outperforms the competing methods when applied to this dataset as well.

Original languageEnglish
Title of host publication31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages266-270
Number of pages5
ISBN (Electronic)9789464593600
DOIs
StatePublished - 2023
Event31st European Signal Processing Conference, EUSIPCO 2023 - Helsinki, Finland
Duration: 4 Sep 20238 Sep 2023

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491

Conference

Conference31st European Signal Processing Conference, EUSIPCO 2023
Country/TerritoryFinland
CityHelsinki
Period4/09/238/09/23

Bibliographical note

Publisher Copyright:
© 2023 European Signal Processing Conference, EUSIPCO. All rights reserved.

Keywords

  • Dereverberation
  • Speaker extraction

Fingerprint

Dive into the research topics of 'A two-stage speaker extraction algorithm under adverse acoustic conditions using a single-microphone'. Together they form a unique fingerprint.

Cite this