Formant estimation and tracking: A deep learning approach

Yehoshua Dissen, Jacob Goldberger, Joseph Keshet

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the estimation task, the input is a stationary speech segment such as the middle part of a vowel, and the goal is to estimate the formant frequencies, whereas in the task of tracking the input is a series of speech frames, and the goal is to track the trajectory of the formant frequencies throughout the signal. The use of supervised machine learning techniques trained on an annotated corpus of read-speech for these tasks is proposed. Two deep network architectures were evaluated for estimation: feed-forward multilayer-perceptrons and convolutional neural-networks and, correspondingly, two architectures for tracking: recurrent and convolutional recurrent networks. The inputs to the former are composed of linear predictive coding-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients, where the inputs to the latter are raw spectrograms. The performance of the methods compares favorably with alternative methods for formant estimation and tracking. A network architecture is further proposed, which allows model adaptation to different formant frequency ranges that were not seen at training time. The adapted networks were evaluated on three datasets, and their performance was further improved.

Original languageEnglish
Pages (from-to)642-653
Number of pages12
JournalJournal of the Acoustical Society of America
Volume145
Issue number2
DOIs
StatePublished - 1 Feb 2019

Bibliographical note

Publisher Copyright:
© 2019 Acoustical Society of America.

Funding

This research was supported by the MAGNET program of the Israeli Innovation Authority. We would like to thank Cynthia Clopper for allowing us to use their dataset.

FundersFunder number
Israeli Innovation Authority

    Fingerprint

    Dive into the research topics of 'Formant estimation and tracking: A deep learning approach'. Together they form a unique fingerprint.

    Cite this