Abstract
The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.
| Original language | English |
|---|---|
| Pages (from-to) | 3665-3675 |
| Number of pages | 11 |
| Journal | Advances in Neural Information Processing Systems |
| Volume | 2017-December |
| State | Published - 2017 |
| Externally published | Yes |
| Event | 31st Annual Conference on Neural Information Processing Systems, NIPS 2017 - Long Beach, United States Duration: 4 Dec 2017 → 9 Dec 2017 |
Bibliographical note
Publisher Copyright:© 2017 Neural information processing systems foundation. All rights reserved.