Abstract
This article introduces a novel method for single-channel speaker separation, focusing on causal and low-latency inference. While significant advancements have been recently made in speaker separation, especially in non-causal scenarios, there is a notable gap concerning low-latency speaker separation - a critical requirement for real-time conversational applications like phone calls, video calls, and human-machine interaction. We propose a two-stage solution that leverages initial-separation in conjunction with speaker representation-driven refinement. To assess the effectiveness of our method, we conducted extensive experiments and examined its performance in both a very low algorithmic latency of 16 milliseconds and a fully causal model. Additionally, we delve into a detailed analysis of the composition of datasets derived from WSJ0, shedding light on their impact on model evaluation. Our two-stage solution demonstrates significant performance enhancements compared to the baseline model and offers an easily deployable solution suitable for edge devices.
| Original language | English |
|---|---|
| Title of host publication | 2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| Pages | 155-159 |
| Number of pages | 5 |
| ISBN (Electronic) | 9798350361858 |
| DOIs | |
| State | Published - 2024 |
| Event | 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Aalborg, Denmark Duration: 9 Sep 2024 → 12 Sep 2024 |
Publication series
| Name | 2024 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 - Proceedings |
|---|
Conference
| Conference | 18th International Workshop on Acoustic Signal Enhancement, IWAENC 2024 |
|---|---|
| Country/Territory | Denmark |
| City | Aalborg |
| Period | 9/09/24 → 12/09/24 |
Bibliographical note
Publisher Copyright:© 2024 IEEE.