TY - JOUR
T1 - NOTSOFAR-1 Challenge
T2 - 25th Interspeech Conferece 2024
AU - Vinnikov, Alon
AU - Ivry, Amir
AU - Hurvitz, Aviv
AU - Abramovski, Igor
AU - Koubi, Sharon
AU - Gurvich, Ilya
AU - Pe'er, Shai
AU - Xiao, Xiong
AU - Elizalde, Benjamin Martinez
AU - Kanda, Naoyuki
AU - Wang, Xiaofei
AU - Shaer, Shalev
AU - Yagev, Stav
AU - Asher, Yossi
AU - Sivasankaran, Sunit
AU - Gong, Yifan
AU - Tang, Min
AU - Wang, Huaming
AU - Krupka, Eyal
N1 - Publisher Copyright:
© 2024 International Speech Communication Association. All rights reserved.
PY - 2024
Y1 - 2024
N2 - We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR) Challenge1, datasets, and baseline system2. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in meeting scenarios, with single-channel and known-geometry multichannel tracks, using a single device. We launch two new datasets: First, a benchmark dataset of 280 English meetings, averaging 6 minutes each, capturing a broad spectrum of acoustic and conversational patterns across 30 rooms with 4-8 attendees. Second, a 1000-hour simulated training dataset, synthesized for real-world generalization, incorporating 15,000 real acoustic transfer functions. The NOTSOFAR-1 Challenge aims to advance research in the field of DASR, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmark datasets.
AB - We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR) Challenge1, datasets, and baseline system2. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in meeting scenarios, with single-channel and known-geometry multichannel tracks, using a single device. We launch two new datasets: First, a benchmark dataset of 280 English meetings, averaging 6 minutes each, capturing a broad spectrum of acoustic and conversational patterns across 30 rooms with 4-8 attendees. Second, a 1000-hour simulated training dataset, synthesized for real-world generalization, incorporating 15,000 real acoustic transfer functions. The NOTSOFAR-1 Challenge aims to advance research in the field of DASR, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmark datasets.
KW - multi-channel speech processing
KW - speaker diarization
KW - speech recognition
KW - speech separation
UR - http://www.scopus.com/inward/record.url?scp=85201026528&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2024-1788
DO - 10.21437/Interspeech.2024-1788
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85201026528
SN - 2308-457X
SP - 5003
EP - 5007
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 1 September 2024 through 5 September 2024
ER -