Learning semantic representations of objects and their parts

Grégoire Mesnil, Antoine Bordes, Jason Weston, Gal Chechik, Yoshua Bengio

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Recently, large scale image annotation datasets have been collected with millions of images and thousands of possible annotations. Latent variable models, or embedding methods, that simultaneously learn semantic representations of object labels and image representations can provide tractable solutions on such tasks. In this work, we are interested in jointly learning representations both for the objects in an image, and the parts of those objects, because such deeper semantic representations could bring a leap forward in image retrieval or browsing. Despite the size of these datasets, the amount of annotated data for objects and parts can be costly and may not be available. In this paper, we propose to bypass this cost with a method able to learn to jointly label objects and parts without requiring exhaustively labeled data. We design a model architecture that can be trained under a proxy supervision obtained by combining standard image annotation (from ImageNet) with semantic part-based within-label relations (from WordNet). The model itself is designed to model both object image to object label similarities, and object label to object part label similarities in a single joint system. Experiments conducted on our combined data and a precisely annotated evaluation set demonstrate the usefulness of our approach.

Original languageEnglish
Pages (from-to)281-301
Number of pages21
JournalMachine Learning
Volume94
Issue number2
DOIs
StatePublished - Feb 2014

Bibliographical note

Funding Information:
Acknowledgements This work was supported by the DARPA Deep Learning Program, NSERC, CIFAR, the Canada Research Chairs, Compute Canada and by the French ANR Project ASAP ANR-09-EMER-001. Codes for the experiments has been implemented using both Torch (2011) and Theano (2010) Machine Learning libraries. Gal Chechik was supported by the Israeli science foundation Grant 1090/12, and by a Marie Curie reintegration grant PIRG06-GA-2009-256566.

Funding

Acknowledgements This work was supported by the DARPA Deep Learning Program, NSERC, CIFAR, the Canada Research Chairs, Compute Canada and by the French ANR Project ASAP ANR-09-EMER-001. Codes for the experiments has been implemented using both Torch (2011) and Theano (2010) Machine Learning libraries. Gal Chechik was supported by the Israeli science foundation Grant 1090/12, and by a Marie Curie reintegration grant PIRG06-GA-2009-256566.

FundersFunder number
Israeli Science Foundation1090/12, PIRG06-GA-2009-256566
Defense Advanced Research Projects Agency
Canadian Institute for Advanced Research
Compute CanadaANR-09-EMER-001
Natural Sciences and Engineering Research Council of Canada
Canada Research Chairs

    Keywords

    • Embeddings
    • Image retrieval
    • Object and parts
    • Ranking

    Fingerprint

    Dive into the research topics of 'Learning semantic representations of objects and their parts'. Together they form a unique fingerprint.

    Cite this