Low-Latency Single-Microphone Speaker Separation with Temporal Convolutional Networks Using Speaker Representations

Boris Rubenchik, Elior Hadad, Eli Tzirkel, Ethan Fetaya, Sharon Gannot

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This article introduces a novel method for single-channel speaker separation, focusing on causal and low-latency inference. While significant advancements have been recently made in speaker separation, especially in non-causal scenarios, there is a notable gap concerning low-latency speaker separation - a critical requirement for real-time conversational applications like phone calls, video calls, and human-machine interaction. We propose a two-stage solution that leverages initial-separation in conjunction with speaker representation-driven refinement. To assess the effectiveness of our method, we conducted extensive experiments and examined its performance in both a very low algorithmic latency of 16 milliseconds and a fully causal model. Additionally, we delve into a detailed analysis of the composition of datasets derived from WSJ0, shedding light on their impact on model evaluation. Our two-stage solution demonstrates significant performance enhancements compared to the baseline model and offers an easily deployable solution suitable for edge devices.

Original languageEnglish
Title of host publication2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages155-159
Number of pages5
ISBN (Electronic)9798350361858
DOIs
StatePublished - 2024
Event18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Aalborg, Denmark
Duration: 9 Sep 202412 Sep 2024

Publication series

Name2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings

Conference

Conference18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024
Country/TerritoryDenmark
CityAalborg
Period9/09/2412/09/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Fingerprint

Dive into the research topics of 'Low-Latency Single-Microphone Speaker Separation with Temporal Convolutional Networks Using Speaker Representations'. Together they form a unique fingerprint.

Cite this