Large-scale content-based audio retrieval from text queries

Gal Chechik, Eugene Ie, Martin Rehn, Samy Bengio, Dick Lyon

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

79 Scopus citations

Abstract

In content-based audio retrieval, the goal is to find sound recordings (audio documents) based on their acoustic features. This content-based approach differs from retrieval approaches that index media files using metadata such as file names and user tags. In this paper, we propose a machine learning approach for retrieving sounds that is novel in that it (1) uses free-form text queries rather than sound based queries, (2) searches by audio content rather than via textual meta data, and (3) scales to very large number of audio documents and very rich query vocabulary. We handle generic sounds, including a wide variety of sound effects, animal vocalizations and natural scenes. We test a scalable approach based on a passive-aggressive model for image retrieval (PAMIR), and compare it to two state-ofthe- art approaches; Gaussian mixture models (GMM) and support vector machines (SVM). We test our approach on two large real-world datasets: a collection of short sound effects, and a noisier and larger collection of user-contributed user-labeled recordings (25K files, 2000 terms vocabulary). We find that all three methods achieved very good retrieval performance. For instance, a positive document is retrieved in the first position of the ranking more than half the time, and on average there are more than 4 positive documents in the first 10 retrieved, for both datasets. PAMIR was one to three orders of magnitude faster than the competing approaches, and should therefore scale to much larger datasets in the future.

Original languageEnglish
Title of host publicationProceedings of the 1st International ACM Conference on Multimedia Information Retrieval, MIR2008, Co-located with the 2008 ACM International Conference on Multimedia, MM'08
Pages105-112
Number of pages8
DOIs
StatePublished - 2008
Externally publishedYes
Event1st International ACM Conference on Multimedia Information Retrieval, MIR2008, Co-located with the 2008 ACM International Conference on Multimedia, MM'08 - Vancouver, BC, Canada
Duration: 30 Aug 200831 Aug 2008

Publication series

NameProceedings of the 1st International ACM Conference on Multimedia Information Retrieval, MIR2008, Co-located with the 2008 ACM International Conference on Multimedia, MM'08

Conference

Conference1st International ACM Conference on Multimedia Information Retrieval, MIR2008, Co-located with the 2008 ACM International Conference on Multimedia, MM'08
Country/TerritoryCanada
CityVancouver, BC
Period30/08/0831/08/08

Keywords

  • Content-based audio retrieval
  • Discriminative learning
  • Large scale
  • Ranking

Fingerprint

Dive into the research topics of 'Large-scale content-based audio retrieval from text queries'. Together they form a unique fingerprint.

Cite this