Abstract
Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multisensory data, based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure, we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition and suggest a novel approach to the selection of diffusion coordinates. To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.
Original language | English |
---|---|
Article number | 5210209 |
Pages (from-to) | 403-413 |
Number of pages | 11 |
Journal | IEEE Transactions on Signal Processing |
Volume | 58 |
Issue number | 1 |
DOIs | |
State | Published - Jan 2010 |
Bibliographical note
Funding Information:Manuscript received September 17, 2008; accepted July 17, 2009. First published August 21, 2009; current version published December 16, 2009. The associate editor coordinating review of this manuscript and approving it for publication was Prof. P. K. Varshney. This work was supported by AFOSR, ARO, and NGA. Y. Keller is with the School of Engineering, Bar Ilan University, Israel (e-mail: [email protected]). R. R. Coifman is with the Department of Mathematics, Yale University, New Haven, CT 06520 USA (e-mail: [email protected]). S. Lafon is with Google Inc., Mountain View, CA 94043 USA (e-mail: [email protected]). S. W. Zucker is with the Department of Computer Science, Yale University, New Haven, CT 06520 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2009.2030861
Funding
Manuscript received September 17, 2008; accepted July 17, 2009. First published August 21, 2009; current version published December 16, 2009. The associate editor coordinating review of this manuscript and approving it for publication was Prof. P. K. Varshney. This work was supported by AFOSR, ARO, and NGA. Y. Keller is with the School of Engineering, Bar Ilan University, Israel (e-mail: [email protected]). R. R. Coifman is with the Department of Mathematics, Yale University, New Haven, CT 06520 USA (e-mail: [email protected]). S. Lafon is with Google Inc., Mountain View, CA 94043 USA (e-mail: [email protected]). S. W. Zucker is with the Department of Computer Science, Yale University, New Haven, CT 06520 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2009.2030861
Funders | Funder number |
---|---|
Air Force Office of Scientific Research | |
Army Research Office |
Keywords
- Dimensionality reduction
- Laplacian eigenmaps
- Multisensor
- Sensor fusion
- Speech recognition