Concurrent Speaker Detection: A Multi-microphone Transformer-Based Approach

Amit Eliav, Sharon Gannot

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD) using a modified transformer model. Our model is designed to handle multi-microphone data but can also work in the single-microphone case. The method can classify audio segments into one of three classes: 1) no speech activity (noise only), 2) only a single speaker is active, and 3) more than one speaker is active. We incorporate a Cost-Sensitive (CS) loss and a confidence calibration to the training procedure. The approach is evaluated using three real-world databases: AMI, AliMeeting, and CHiME 5, demonstrating an improvement over existing approaches.

Original languageEnglish
Title of host publication32nd European Signal Processing Conference, EUSIPCO 2024 - Proceedings
PublisherEuropean Signal Processing Conference, EUSIPCO
Pages897-901
Number of pages5
ISBN (Electronic)9789464593617
DOIs
StatePublished - 2024
Event32nd European Signal Processing Conference, EUSIPCO 2024 - Lyon, France
Duration: 26 Aug 202430 Aug 2024

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491

Conference

Conference32nd European Signal Processing Conference, EUSIPCO 2024
Country/TerritoryFrance
CityLyon
Period26/08/2430/08/24

Bibliographical note

Publisher Copyright:
© 2024 European Signal Processing Conference, EUSIPCO. All rights reserved.

Fingerprint

Dive into the research topics of 'Concurrent Speaker Detection: A Multi-microphone Transformer-Based Approach'. Together they form a unique fingerprint.

Cite this