A phoneme-based pre-training approach for deep neural network with application to speech enhancement

Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

In this study, we present a new phoneme-based deep neural network (DNN) framework for single microphone speech enhancement. While most speech enhancement algorithms overlook the phoneme structure of the speech signal, our proposed framework comprises a set of phoneme-specific DNNs (pDNNs), one for each phoneme, together with an additional phoneme-classification DNN (cDNN). The cDNN is responsible for determining the posterior probability that a specific phoneme was uttered. Concurrently, each of the pDNNs estimates a phoneme-specific speech presence probability (pSPP). The speech presence probability (SPP) is then calculated as a weighted averaging of the phoneme-specific pSPPs, with the weights determined by the posterior phoneme probability. A soft spectral attenuation, based on the SPP, is then applied to enhance the noisy speech signal. We further propose a compound training procedure, where each pDNN is first pre-trained using the phoneme labeling and the cDNN is trained to classify phonemes. Since these labels are unavailable in the test phase, the entire network is then trained using the noisy utterance, with the cDNN providing phoneme classification. A series of experiments in different noise types verifies the applicability of the new algorithm to the task of speech enhancement. Moreover, the proposed scheme outperforms other schemes that either do not consider the phoneme structure or use simpler training methodology.

Original languageEnglish
Title of host publication2016 International Workshop on Acoustic Signal Enhancement, IWAENC 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509020072
DOIs
StatePublished - 19 Oct 2016
Event15th International Workshop on Acoustic Signal Enhancement, IWAENC 2016 - Xi'an, China
Duration: 13 Sep 201616 Sep 2016

Publication series

Name2016 International Workshop on Acoustic Signal Enhancement, IWAENC 2016

Conference

Conference15th International Workshop on Acoustic Signal Enhancement, IWAENC 2016
Country/TerritoryChina
CityXi'an
Period13/09/1616/09/16

Bibliographical note

Publisher Copyright:
© 2016 IEEE.

Keywords

  • Deep learning
  • Neural network
  • Phoneme

Fingerprint

Dive into the research topics of 'A phoneme-based pre-training approach for deep neural network with application to speech enhancement'. Together they form a unique fingerprint.

Cite this