Abstract
This article introduces a novel method for single-channel speaker separation, focusing on causal and low-latency inference. While significant advancements have been recently made in speaker separation, especially in non-causal scenarios, there is a notable gap concerning low-latency speaker separation - a critical requirement for real-time conversational applications like phone calls, video calls, and human-machine interaction. We propose a two-stage solution that leverages initial-separation in conjunction with speaker representation-driven refinement. To assess the effectiveness of our method, we conducted extensive experiments and examined its performance in both a very low algorithmic latency of 16 milliseconds and a fully causal model. Additionally, we delve into a detailed analysis of the composition of datasets derived from WSJ0, shedding light on their impact on model evaluation. Our two-stage solution demonstrates significant performance enhancements compared to the baseline model and offers an easily deployable solution suitable for edge devices.
Original language | English |
---|---|
Title of host publication | 2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 155-159 |
Number of pages | 5 |
ISBN (Electronic) | 9798350361858 |
DOIs | |
State | Published - 2024 |
Event | 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Aalborg, Denmark Duration: 9 Sep 2024 → 12 Sep 2024 |
Publication series
Name | 2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings |
---|
Conference
Conference | 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 |
---|---|
Country/Territory | Denmark |
City | Aalborg |
Period | 9/09/24 → 12/09/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.