Audio-visual group recognition using diffusion maps

Yosi Keller, Ronald R. Coifman, Stéphane Lafon, Steven W. Zucker

Research output: Contribution to journalArticlepeer-review

42 Scopus citations


Data fusion is a natural and common approach to recovering the state of physical systems. But the dissimilar appearance of different sensors remains a fundamental obstacle. We propose a unified embedding scheme for multisensory data, based on the spectral diffusion framework, which addresses this issue. Our scheme is purely data-driven and assumes no a priori statistical or deterministic models of the data sources. To extract the underlying structure, we first embed separately each input channel; the resultant structures are then combined in diffusion coordinates. In particular, as different sensors sample similar phenomena with different sampling densities, we apply the density invariant Laplace-Beltrami embedding. This is a fundamental issue in multisensor acquisition and processing, overlooked in prior approaches. We extend previous work on group recognition and suggest a novel approach to the selection of diffusion coordinates. To verify our approach, we demonstrate performance improvements in audio/visual speech recognition.

Original languageEnglish
Article number5210209
Pages (from-to)403-413
Number of pages11
JournalIEEE Transactions on Signal Processing
Issue number1
StatePublished - Jan 2010

Bibliographical note

Funding Information:
Manuscript received September 17, 2008; accepted July 17, 2009. First published August 21, 2009; current version published December 16, 2009. The associate editor coordinating review of this manuscript and approving it for publication was Prof. P. K. Varshney. This work was supported by AFOSR, ARO, and NGA. Y. Keller is with the School of Engineering, Bar Ilan University, Israel (e-mail: R. R. Coifman is with the Department of Mathematics, Yale University, New Haven, CT 06520 USA (e-mail: S. Lafon is with Google Inc., Mountain View, CA 94043 USA (e-mail: S. W. Zucker is with the Department of Computer Science, Yale University, New Haven, CT 06520 USA (e-mail: Color versions of one or more of the figures in this paper are available online at Digital Object Identifier 10.1109/TSP.2009.2030861


  • Dimensionality reduction
  • Laplacian eigenmaps
  • Multisensor
  • Sensor fusion
  • Speech recognition


Dive into the research topics of 'Audio-visual group recognition using diffusion maps'. Together they form a unique fingerprint.

Cite this