NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe'er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR) Challenge1, datasets, and baseline system2. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in meeting scenarios, with single-channel and known-geometry multichannel tracks, using a single device. We launch two new datasets: First, a benchmark dataset of 280 English meetings, averaging 6 minutes each, capturing a broad spectrum of acoustic and conversational patterns across 30 rooms with 4-8 attendees. Second, a 1000-hour simulated training dataset, synthesized for real-world generalization, incorporating 15,000 real acoustic transfer functions. The NOTSOFAR-1 Challenge aims to advance research in the field of DASR, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmark datasets.

Original languageEnglish
Pages (from-to)5003-5007
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Externally publishedYes
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Bibliographical note

Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.

Keywords

  • multi-channel speech processing
  • speaker diarization
  • speech recognition
  • speech separation

Fingerprint

Dive into the research topics of 'NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription'. Together they form a unique fingerprint.

Cite this