We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge. The new examples are simpler than the SNLI test set, containing sentences that differ by at most one word from sentences in the training set. Yet, the performance on the new test set is substantially worse across systems trained on SNLI, demonstrating that these systems are limited in their generalization ability, failing to capture many simple inferences.
|Title of host publication||ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers)|
|Publisher||Association for Computational Linguistics (ACL)|
|Number of pages||6|
|State||Published - 2018|
|Event||56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia|
Duration: 15 Jul 2018 → 20 Jul 2018
|Name||ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)|
|Conference||56th Annual Meeting of the Association for Computational Linguistics, ACL 2018|
|Period||15/07/18 → 20/07/18|
Bibliographical noteFunding Information:
We would like to thank Qian Chen for evaluating KIM on our test set. This work was supported in part by the German Research Foundation through the German-Israeli Project Cooperation (DIP, grant DA 1600/1-1), an Intel ICRI-CI grant, Theo Hoffenberg, and the Israel Science Foundation grants 1951/17 and 1555/15. Vered is also supported by the Clore Scholars Programme (2017), and the AI2 Key Scientific Challenges Program (2017).
© 2018 Association for Computational Linguistics