For estimating the relative transfer function (RTF) of a speaker from noisy multi-microphone recordings, several statistical methods have been proposed. The estimation accuracy is different over frequencies, which mostly depends on the frequency-dependent signal-to-noise ratio (SNR). Provided that the low-SNR frequencies are identified, the corresponding values of the estimated RTF can be replaced through interpolation using the frequencies with high SNR. In this study, we explore interpolation techniques based on the sparse reconstruction of an incomplete RTF which is obtained when low-SNR values are neglected. Compared to previous attempts where the approximate sparsity of the time-domain representation of RTF (relative impulse response) is exploited, in this paper, we use learned sparse dictionaries trained on dense measurements of RTFs within a confined area of the target speaker. These measurements are obtained from the recently released MIRaGe database acquired in a real room.
|Title of host publication||29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings|
|Publisher||European Signal Processing Conference, EUSIPCO|
|Number of pages||5|
|State||Published - 2021|
|Event||29th European Signal Processing Conference, EUSIPCO 2021 - Dublin, Ireland|
Duration: 23 Aug 2021 → 27 Aug 2021
|Name||European Signal Processing Conference|
|Conference||29th European Signal Processing Conference, EUSIPCO 2021|
|Period||23/08/21 → 27/08/21|
Bibliographical noteFunding Information:
This work was supported by The Czech Science Foundation through Project No. 20-17720S.
© 2021 European Signal Processing Conference. All rights reserved.
- Dictionary learning
- Relative transfer function
- Room impulse responses
- Sparse dictionaries
- Sparse representations