SG-VAD: Stochastic Gates Based Speech Activity Detection

Jonathan Svirsky, Ofir Lindenbaum

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components.

Original languageEnglish
Title of host publicationICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728163277
DOIs
StatePublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Feature Selection
  • Speech Recognition
  • Voice Activity Detection

Fingerprint

Dive into the research topics of 'SG-VAD: Stochastic Gates Based Speech Activity Detection'. Together they form a unique fingerprint.

Cite this