Abstract
Pitch estimation is an essential task in audio processing due to its key role in many speech and music applications. Still, accurately predicting a continuous value from a high range of pitch frequencies is a challenging task. Inspired by the success of signal processing filterbank methods, we propose a novel deep architecture for accurate pitch estimation. The proposed method is composed of an encoder and multiple decoders. The encoder is implemented by a convolutional neural network that provides a good representation of the raw audio signal, and its output is fed into a set of decoders. Each decoder predicts the pitch value within a specific frequency band and is implemented by a fully-connected neural network. Such a construction allows each decoder to specialize in a particular frequency regime, which turns into a more accurate estimation of pitch values for music and speech signals.
Original language | English |
---|---|
Article number | 9501499 |
Pages (from-to) | 1610-1614 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 28 |
DOIs | |
State | Published - 2021 |
Bibliographical note
Publisher Copyright:© 1994-2012 IEEE.
Funding
Manuscript received June 8, 2021; accepted July 12, 2021. Date of publication July 29, 2021; date of current version August 18, 2021. The work of Yael Segal was supported by the Ministry of Science and Technology, Israel. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Daniele Giacobello. (Corresponding author: Yael Segal.) The authors are with the Department of Computer Science, Bar-Ilan University, Ramat-Gan 5290002, Israel (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/LSP.2021.3100812
Funders | Funder number |
---|---|
Ministry of science and technology, Israel |
Keywords
- Convolutional neural networks
- deep neural networks
- fundamental frequency
- pitch estimation
- speech processing