Abstract
Speaker Change Detection (SCD) is the task of segmenting an input audio-recording according to speaker interchanges. This task is essential for many applications, such as automatic voice transcription or Speaker Diarization (SD). This paper focuses on the essential task of audio segmentation and suggests a word-embedding-based solution for the SCD problem. Moreover, we show how to use our approach in order to outperform voice-based solutions for the SD problem. We empirically show that our method can accurately identify the speaker-turns in an audio-recording with 82.12% and 89.02% success in the Recall and F1-score measures.
Original language | English |
---|---|
Title of host publication | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
Publisher | International Speech Communication Association |
Pages | 2473-2477 |
Number of pages | 5 |
ISBN (Electronic) | 9781713836902 |
DOIs | |
State | Published - 2021 |
Externally published | Yes |
Event | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sep 2021 |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
---|---|
Volume | 4 |
ISSN (Print) | 2308-457X |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
---|---|
Country/Territory | Czech Republic |
City | Brno |
Period | 30/08/21 → 3/09/21 |
Bibliographical note
Publisher Copyright:Copyright © 2021 ISCA.
Funding
The authors wish to thank IFAT Group that provided the dataset for this research. In addition, this work was supported by the Ariel Cyber Innovation Center in conjunction with the Israel National Cyber directorate in the Prime Minister’s Office.
Keywords
- Clustering
- Speaker change detection
- Speaker diarization
- Speech recognition
- Word embedding