Formant estimation and tracking using deep learning

Yehoshua Dissen, Joseph Keshet

Research output: Contribution to journalConference articlepeer-review

15 Scopus citations

Abstract

Formant frequency estimation and tracking are among the most fundamental problems in speech processing. In the former task the input is a stationary speech segment such as the middle part of a vowel and the goal is to estimate the formant frequencies, whereas in the latter task the input is a series of speech frames and the goal is to track the trajectory of the formant frequencies throughout the signal. Traditionally, formant estimation and tracking is done using ad-hoc signal processing methods. In this paper we propose using machine learning techniques trained on an annotated corpus of read speech for these tasks. Our feature set is composed of LPC-based cepstral coefficients with a range of model orders and pitch-synchronous cepstral coefficients. Two deep network architectures are used as learning algorithms: a deep feed-forward network for the estimation task and a recurrent neural network for the tracking task. The performance of our methods compares favorably with mainstream LPC-based implementations and state-of-the-art tracking algorithms.

Original languageEnglish
Pages (from-to)958-962
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - 2016
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: 8 Sep 201616 Sep 2016

Bibliographical note

Publisher Copyright:
Copyright © 2016 ISCA.

Keywords

  • Deep neural networks
  • Formant estimation
  • Formant tracking
  • Recurrent neural networks

Fingerprint

Dive into the research topics of 'Formant estimation and tracking using deep learning'. Together they form a unique fingerprint.

Cite this