Abstract
Determining the spatial position of a speaker finds a growing interest in video conference scenarios where automated camera steering and tracking are required. Speaker localization can be achieved with a dual-step approach. In the preliminary stage a microphone array is used to extract the time difference of arrival (TDOA) of the speech signal. These readings are then used by the second stage for the actual localization. In this work we present novel, frequency domain, approaches for TDOA calculation in a reverberant and noisy environment. Our methods are based on the speech quasi-stationarity property, noise stationarity and on the fact that the speech and the noise are uncorrelated. The mathematical derivations in this work are followed by an extensive experimental study which involves static and tracking scenarios.
Original language | English |
---|---|
Pages (from-to) | 177-204 |
Number of pages | 28 |
Journal | Signal Processing |
Volume | 85 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2005 |
Keywords
- Decorrelation
- Non-stationarity
- Source localization
- TDOA