Speech and multilingual natural language framework for speaker change detection and diarization

Or Haim Anidjar, Yannick Estève, Chen Hajaj, Amit Dvir, Itshak Lapidot

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Speaker Change Detection (SCD) is the problem of splitting an audio-recording by its speaker-turns. Many real-world problems, such as the Speaker Diarization (SD) or automatic speech transcription, are influenced by the quality of the speaker-turns estimation. Previous works have already shown that auxiliary textual information (for mono-lingual systems) can be of great use for detection of speaker-turns and the diarization systems’ performance. In this paper, we suggest a framework for speaker-turn estimation, as well as the determination of clustered speaker identities to the SD system, and examine our approach over a multi-lingual dataset that consists of three mono-lingual datasets—in English, French, and Hebrew. As such, we propose a generic and language-independent framework for the SCD problem that is learned through textual information using state-of-the-art transformer-based techniques and speech-embedding modules. Comprehensive experimental evaluation shows that (i) our multi-lingual SCD framework is competitive enough when compared to a framework over mono-lingual datasets, and that (ii) textual information improves the solution's quality compared to the speech signal-based approach. In addition, we show that our multi-lingual SCD approach does not harm the performance of SD systems.

Original languageEnglish
Article number119238
JournalExpert Systems with Applications
Volume213
DOIs
StatePublished - 1 Mar 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2022 Elsevier Ltd

Funding

The authors wish to thank IFAT Group which provided the Hebrew dataset. In addition, this work was supported by the Ariel Cyber Innovation Center in conjunction with the Israel National Cyber directorate in the Prime Minister’s Office , and Ariel Data Science and Artificial Intelligence Research Center .

FundersFunder number
Data Science and Artificial Intelligence Research Centre, Nanyang Technological University

    Keywords

    • Speaker change detection
    • Speaker diarization
    • Speaker embedding
    • Speech recognition
    • Transformers

    Fingerprint

    Dive into the research topics of 'Speech and multilingual natural language framework for speaker change detection and diarization'. Together they form a unique fingerprint.

    Cite this