Abstract
We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components.
Original language | English |
---|---|
Title of host publication | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728163277 |
DOIs | |
State | Published - 2023 |
Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
Volume | 2023-June |
ISSN (Print) | 1520-6149 |
Conference
Conference | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 |
---|---|
Country/Territory | Greece |
City | Rhodes Island |
Period | 4/06/23 → 10/06/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Feature Selection
- Speech Recognition
- Voice Activity Detection