Abstract
We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components.
Original language | English |
---|---|
Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
DOIs | |
State | Published - 2023 |
Event | 48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece Duration: 4 Jun 2023 → 10 Jun 2023 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Feature Selection
- Speech Recognition
- Voice Activity Detection