Abstract
Human subjective evaluation is optimal to assess speech quality for human perception. The recently introduced deep noise suppression mean opinion score (DNSMOS) metric was shown to estimate human ratings with great accuracy. The signal-to-distortion ratio (SDR) metric is widely used to evaluate residual-echo suppression (RES) systems by estimating speech quality during double-talk. However, since the SDR is affected by both speech distortion and residual-echo presence, it does not correlate well with human ratings according to the DNSMOS. To address that, we introduce two objective metrics to separately quantify the desired-speech maintained level (DSML) and residual-echo suppression level (RESL) during double-talk. These metrics are evaluated using a deep learning-based RES-system with a tunable design parameter. Using 280 hours of real and simulated recordings, we show that the DSML and RESL correlate well with the DNSMOS with high generalization to various setups. Also, we empirically investigate the relation between tuning the RES-system design parameter and the DSML-RESL tradeoff it creates and offer a practical design scheme for dynamic system requirements.
Original language | English |
---|---|
Title of host publication | 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 101-105 |
Number of pages | 5 |
ISBN (Electronic) | 9781665448703 |
DOIs | |
State | Published - 2021 |
Externally published | Yes |
Event | 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 - New Paltz, United States Duration: 17 Oct 2021 → 20 Oct 2021 |
Publication series
Name | IEEE Workshop on Applications of Signal Processing to Audio and Acoustics |
---|---|
Volume | 2021-October |
ISSN (Print) | 1931-1168 |
ISSN (Electronic) | 1947-1629 |
Conference
Conference | 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2021 |
---|---|
Country/Territory | United States |
City | New Paltz |
Period | 17/10/21 → 20/10/21 |
Bibliographical note
Publisher Copyright:© 2021 IEEE.
Funding
This research was supported by the Pazy Research Foundation and the ISF-NSFC joint research program (grant No. 2514/17). The authors thank Stem Audio for providing equipment and technical guidance.
Funders | Funder number |
---|---|
ISF-NSFC | 2514/17 |
Pazy Research Foundation |
Keywords
- Residual-echo suppression
- deep learning
- echo cancellation
- objective metrics
- perceptual speech quality