TY - JOUR
T1 - Data-driven multi-microphone speaker localization on manifolds
AU - Laufer-Goldshtein, Bracha
AU - Talmon, Ronen
AU - Gannot, Sharon
N1 - Publisher Copyright:
© 2020 Now Publishers Inc. All rights reserved.
PY - 2020/10/6
Y1 - 2020/10/6
N2 - Speech enhancement is a core problem in audio signal processing with commercial applications in devices as diverse as mobile phones, conference call systems, smart assistants, and hearing aids. An essential component in the design of speech enhancement algorithms is acoustic source localization. Speaker localization is also directly applicable to many other audio related tasks, e.g., automated camera steering, teleconferencing systems, and robot audition. From a signal processing perspective, speaker localization is the task of mapping multichannel speech signals to 3-D source coordinates. To obtain viable solutions for this mapping, an accurate description of the source wave propagation captured by the respective acoustic channel is required. In fact, the acoustic channels can be considered as the spatial fingerprints characterizing the positions of each of the sources in a reverberant enclosure. These fingerprints represent complex reflection patterns stemming from the surfaces and objects characterizing the enclosure. Hence, they are usually modelled by a very large number of coefficients, resulting in an intricate high-dimensional representation. We claim that in static acoustic environments, despite the high dimensional representation, the difference between acoustic channels can be attributed mainly to changes in the source position. Thus, the true intrinsic dimensionality of the variations of the acoustic channels are significantly smaller than the number of variables commonly used to represent them; that is, the acoustic channels pertain to a low-dimensional manifold that can be inferred from data using nonlinear dimensionality reduction techniques. A comprehensive experimental study carried out in a real-life acoustic environment demonstrates the validity of the proposed manifold-based paradigm. Motivated by this result, several high-performance localization and tracking methods were developed by harnessing novel mathematical tools for learning over manifolds, including diffusion maps, semi-supervised learning, optimization in reproducing kernel Hilbert spaces and Gaussian process inference. We present two localization algorithms that were designed for a single microphone array of two microphones. These algorithms were extended to several distributed arrays by merging the information of the different manifolds associated with each array. Tracking a moving source was also addressed by a data-driven propagation model relating movements on the abstract manifold to the actual source displacements. This data-driven propagation model was combined with a classical localization approach, in a hybrid algorithm that ties together the two worlds of classical and data-driven localization, while gaining the benefits of both. We show that the proposed algorithms outperform state-of-the-art localization methods, and obtain high accuracy in challenging noisy and reverberant environments.
AB - Speech enhancement is a core problem in audio signal processing with commercial applications in devices as diverse as mobile phones, conference call systems, smart assistants, and hearing aids. An essential component in the design of speech enhancement algorithms is acoustic source localization. Speaker localization is also directly applicable to many other audio related tasks, e.g., automated camera steering, teleconferencing systems, and robot audition. From a signal processing perspective, speaker localization is the task of mapping multichannel speech signals to 3-D source coordinates. To obtain viable solutions for this mapping, an accurate description of the source wave propagation captured by the respective acoustic channel is required. In fact, the acoustic channels can be considered as the spatial fingerprints characterizing the positions of each of the sources in a reverberant enclosure. These fingerprints represent complex reflection patterns stemming from the surfaces and objects characterizing the enclosure. Hence, they are usually modelled by a very large number of coefficients, resulting in an intricate high-dimensional representation. We claim that in static acoustic environments, despite the high dimensional representation, the difference between acoustic channels can be attributed mainly to changes in the source position. Thus, the true intrinsic dimensionality of the variations of the acoustic channels are significantly smaller than the number of variables commonly used to represent them; that is, the acoustic channels pertain to a low-dimensional manifold that can be inferred from data using nonlinear dimensionality reduction techniques. A comprehensive experimental study carried out in a real-life acoustic environment demonstrates the validity of the proposed manifold-based paradigm. Motivated by this result, several high-performance localization and tracking methods were developed by harnessing novel mathematical tools for learning over manifolds, including diffusion maps, semi-supervised learning, optimization in reproducing kernel Hilbert spaces and Gaussian process inference. We present two localization algorithms that were designed for a single microphone array of two microphones. These algorithms were extended to several distributed arrays by merging the information of the different manifolds associated with each array. Tracking a moving source was also addressed by a data-driven propagation model relating movements on the abstract manifold to the actual source displacements. This data-driven propagation model was combined with a classical localization approach, in a hybrid algorithm that ties together the two worlds of classical and data-driven localization, while gaining the benefits of both. We show that the proposed algorithms outperform state-of-the-art localization methods, and obtain high accuracy in challenging noisy and reverberant environments.
UR - http://www.scopus.com/inward/record.url?scp=85092430040&partnerID=8YFLogxK
U2 - 10.1561/2000000098
DO - 10.1561/2000000098
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85092430040
SN - 1932-8346
VL - 14
SP - 1
EP - 165
JO - Foundations and Trends in Signal Processing
JF - Foundations and Trends in Signal Processing
IS - 1-2
ER -