Pitch Estimation by Multiple Octave Decoders

Yael Segal, May Arama-Chayoth, Joseph Keshet

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Pitch estimation is an essential task in audio processing due to its key role in many speech and music applications. Still, accurately predicting a continuous value from a high range of pitch frequencies is a challenging task. Inspired by the success of signal processing filterbank methods, we propose a novel deep architecture for accurate pitch estimation. The proposed method is composed of an encoder and multiple decoders. The encoder is implemented by a convolutional neural network that provides a good representation of the raw audio signal, and its output is fed into a set of decoders. Each decoder predicts the pitch value within a specific frequency band and is implemented by a fully-connected neural network. Such a construction allows each decoder to specialize in a particular frequency regime, which turns into a more accurate estimation of pitch values for music and speech signals.

Original languageEnglish
Article number9501499
Pages (from-to)1610-1614
Number of pages5
JournalIEEE Signal Processing Letters
Volume28
DOIs
StatePublished - 2021

Bibliographical note

Publisher Copyright:
© 1994-2012 IEEE.

Funding

Manuscript received June 8, 2021; accepted July 12, 2021. Date of publication July 29, 2021; date of current version August 18, 2021. The work of Yael Segal was supported by the Ministry of Science and Technology, Israel. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Daniele Giacobello. (Corresponding author: Yael Segal.) The authors are with the Department of Computer Science, Bar-Ilan University, Ramat-Gan 5290002, Israel (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/LSP.2021.3100812

FundersFunder number
Ministry of science and technology, Israel

    Keywords

    • Convolutional neural networks
    • deep neural networks
    • fundamental frequency
    • pitch estimation
    • speech processing

    Fingerprint

    Dive into the research topics of 'Pitch Estimation by Multiple Octave Decoders'. Together they form a unique fingerprint.

    Cite this