Automatic recognition of second language speech-in-noise

Seung Eun Kim, Bronya R. Chernyak, Olga Seleznova, Joseph Keshet, Matthew Goldrick, Ann R. Bradlow

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

Measuring how well human listeners recognize speech under varying environmental conditions (speech intelligibility) is a challenge for theoretical, technological, and clinical approaches to speech communication. The current gold standard—human transcription—is time- and resource-intensive. Recent advances in automatic speech recognition (ASR) systems raise the possibility of automating intelligibility measurement. This study tested 4 state-of-the-art ASR systems with second language speech-in-noise and found that one, whisper, performed at or above human listener accuracy. However, the content of whisper's responses diverged substantially from human responses, especially at lower signal-to-noise ratios, suggesting both opportunities and limitations for ASR-based speech intelligibility modeling.

Original languageEnglish
Article number025204
JournalJASA Express Letters
Volume4
Issue number2
DOIs
StatePublished - 1 Feb 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 Author(s).

Funding

This work was supported by NSF DRL Grant No. 2219843 and BSF Grant No. 2022618. Thanks to Chun Chan for assistance with human data collection.

FundersFunder number
NSF DRL2219843
Bloom's Syndrome Foundation2022618

    Fingerprint

    Dive into the research topics of 'Automatic recognition of second language speech-in-noise'. Together they form a unique fingerprint.

    Cite this