TY - JOUR
T1 - Domain Adaptation Using Suitable Pseudo Labels for Speech Enhancement and Dereverberation
AU - Frenkel, Lior
AU - Chazan, Shlomo E.
AU - Goldberger, Jacob
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024
Y1 - 2024
N2 - Speech enhancement and dereverberation approaches based on neural networks are designed to learn a transformation from noisy to clean speech using supervised learning. However, networks trained in this way may fail to effectively handle languages, types of noise, or acoustic environments that were not included in the training data. To tackle this issue, the present study centers around unsupervised domain adaptation, specifically addressing scenarios characterized by substantial domain gaps. In this scenario, we have noisy speech data from the new domain, but the corresponding clean speech data is unavailable. We propose an adaptation method based on domain-adversarial training followed by iterative self-training, where the estimated speech is used as pseudo labels, and the target samples are gradually introduced to the network based on their similarity to the source domain. The self-training also utilizes labeled samples from the source domain which are similar to the target domain. The experimental results show that our method effectively mitigates the domain mismatch between the training and test sets, thus outperforming the current baselines.
AB - Speech enhancement and dereverberation approaches based on neural networks are designed to learn a transformation from noisy to clean speech using supervised learning. However, networks trained in this way may fail to effectively handle languages, types of noise, or acoustic environments that were not included in the training data. To tackle this issue, the present study centers around unsupervised domain adaptation, specifically addressing scenarios characterized by substantial domain gaps. In this scenario, we have noisy speech data from the new domain, but the corresponding clean speech data is unavailable. We propose an adaptation method based on domain-adversarial training followed by iterative self-training, where the estimated speech is used as pseudo labels, and the target samples are gradually introduced to the network based on their similarity to the source domain. The self-training also utilizes labeled samples from the source domain which are similar to the target domain. The experimental results show that our method effectively mitigates the domain mismatch between the training and test sets, thus outperforming the current baselines.
KW - Unsupervised domain adaptation
KW - dereverberation
KW - pseudo labels
KW - self-training
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85183952150&partnerID=8YFLogxK
U2 - 10.1109/taslp.2024.3358051
DO - 10.1109/taslp.2024.3358051
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85183952150
SN - 2329-9290
VL - 32
SP - 1226
EP - 1236
JO - IEEE/ACM Transactions on Audio Speech and Language Processing
JF - IEEE/ACM Transactions on Audio Speech and Language Processing
ER -