Context-aware captions from context-agnostic supervision

Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal Chechik

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

84 Scopus citations

Abstract

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of “siamese cat” and “tiger cat”, we generate language that describes the “siamese cat” in a way that distinguishes it from “tiger cat”. Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination.

Original languageEnglish
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1070-1079
Number of pages10
ISBN (Electronic)9781538604571
DOIs
StatePublished - 6 Nov 2017
Externally publishedYes
Event30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States
Duration: 21 Jul 201726 Jul 2017

Publication series

NameProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume2017-January

Conference

Conference30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Country/TerritoryUnited States
CityHonolulu
Period21/07/1726/07/17

Bibliographical note

Publisher Copyright:
© 2017 IEEE.

Fingerprint

Dive into the research topics of 'Context-aware captions from context-agnostic supervision'. Together they form a unique fingerprint.

Cite this