Single microphone speaker extraction using unified time-frequency Siamese-Unet

Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

In this paper we present a unified time-frequency method for speaker extraction in clean and noisy conditions. Given a mixed signal, along with a reference signal, the common approaches for extracting the desired speaker are either applied in the time-domain or in the frequency-domain. In our approach, we propose a Siamese-Unet architecture that uses both representations. The Siamese encoders are applied in the frequency-domain to infer the embedding of the noisy and reference spectra, respectively. The concatenated representations are then fed into the decoder to estimate the real and imaginary components of the desired speaker, which are then inverse-transformed to the time-domain. The model is trained with the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) loss to exploit the time-domain information. The time-domain loss is also regularized with frequency-domain loss to preserve the speech patterns. Experimental results demonstrate that the unified approach is not only very easy to train, but also provides superior results as compared with Blind Source Separation (BSS) methods, as well as commonly used speaker extraction approach.

Original languageEnglish
Title of host publication30th European Signal Processing Conference, EUSIPCO 2022 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages762-766
Number of pages5
ISBN (Electronic)9789082797091
StatePublished - 2022
Event30th European Signal Processing Conference, EUSIPCO 2022 - Belgrade, Serbia
Duration: 29 Aug 20222 Sep 2022

Publication series

NameEuropean Signal Processing Conference
Volume2022-August
ISSN (Print)2219-5491

Conference

Conference30th European Signal Processing Conference, EUSIPCO 2022
Country/TerritorySerbia
CityBelgrade
Period29/08/222/09/22

Bibliographical note

Publisher Copyright:
© 2022 European Signal Processing Conference, EUSIPCO. All rights reserved.

Funding

1This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245. 1This project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245.

FundersFunder number
Horizon 2020 Framework Programme
Horizon 2020871245

    Fingerprint

    Dive into the research topics of 'Single microphone speaker extraction using unified time-frequency Siamese-Unet'. Together they form a unique fingerprint.

    Cite this