Abstract
We present a study of a neural network-based method for speech emotion recognition that uses audio-only features. In the studied scheme, the acoustic features are extracted from the audio utterances and fed to a neural network that consists of convolutional neural networks (CNN) layers, bidirectional long short-term memory (BLSTM) combined with an attention mechanism layer, and a fully-connected layer. To illustrate and analyze the classification capabilities of the network, we used the t-distributed stochastic neighbor embedding (t-SNE) method. We evaluate our model using Ryerson audio-visual dataset of emotional speech and song (RAVDESS) and interactive emotional dyadic motion capture (IEMOCAP) datasets achieving weighted accuracy (WA) of 80% and 66%, respectively.
Original language | English |
---|---|
Title of host publication | 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings |
Publisher | European Signal Processing Conference, EUSIPCO |
Pages | 416-420 |
Number of pages | 5 |
ISBN (Electronic) | 9789464593600 |
DOIs | |
State | Published - 2023 |
Event | 31st European Signal Processing Conference, EUSIPCO 2023 - Helsinki, Finland Duration: 4 Sep 2023 → 8 Sep 2023 |
Publication series
Name | European Signal Processing Conference |
---|---|
ISSN (Print) | 2219-5491 |
Conference
Conference | 31st European Signal Processing Conference, EUSIPCO 2023 |
---|---|
Country/Territory | Finland |
City | Helsinki |
Period | 4/09/23 → 8/09/23 |
Bibliographical note
Publisher Copyright:© 2023 European Signal Processing Conference, EUSIPCO. All rights reserved.
Funding
This project has received funding from the European Union's Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245. This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245.
Funders | Funder number |
---|---|
Horizon 2020 Framework Programme | |
Horizon 2020 | 871245 |
Keywords
- Attention Mechanism
- Deep Neural Network
- Speech Emotion Recognition