Abstract
In this study, we present a new phoneme-based deep neural network (DNN) framework for single microphone speech enhancement. While most speech enhancement algorithms overlook the phoneme structure of the speech signal, our proposed framework comprises a set of phoneme-specific DNNs (pDNNs), one for each phoneme, together with an additional phoneme-classification DNN (cDNN). The cDNN is responsible for determining the posterior probability that a specific phoneme was uttered. Concurrently, each of the pDNNs estimates a phoneme-specific speech presence probability (pSPP). The speech presence probability (SPP) is then calculated as a weighted averaging of the phoneme-specific pSPPs, with the weights determined by the posterior phoneme probability. A soft spectral attenuation, based on the SPP, is then applied to enhance the noisy speech signal. We further propose a compound training procedure, where each pDNN is first pre-trained using the phoneme labeling and the cDNN is trained to classify phonemes. Since these labels are unavailable in the test phase, the entire network is then trained using the noisy utterance, with the cDNN providing phoneme classification. A series of experiments in different noise types verifies the applicability of the new algorithm to the task of speech enhancement. Moreover, the proposed scheme outperforms other schemes that either do not consider the phoneme structure or use simpler training methodology.
Original language | English |
---|---|
Title of host publication | 2016 International Workshop on Acoustic Signal Enhancement, IWAENC 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781509020072 |
DOIs | |
State | Published - 19 Oct 2016 |
Event | 15th International Workshop on Acoustic Signal Enhancement, IWAENC 2016 - Xi'an, China Duration: 13 Sep 2016 → 16 Sep 2016 |
Publication series
Name | 2016 International Workshop on Acoustic Signal Enhancement, IWAENC 2016 |
---|
Conference
Conference | 15th International Workshop on Acoustic Signal Enhancement, IWAENC 2016 |
---|---|
Country/Territory | China |
City | Xi'an |
Period | 13/09/16 → 16/09/16 |
Bibliographical note
Publisher Copyright:© 2016 IEEE.
Keywords
- Deep learning
- Neural network
- Phoneme