Abstract
We create a new NLI test set that shows the deficiency of state-of-the-art models in inferences that require lexical and world knowledge. The new examples are simpler than the SNLI test set, containing sentences that differ by at most one word from sentences in the training set. Yet, the performance on the new test set is substantially worse across systems trained on SNLI, demonstrating that these systems are limited in their generalization ability, failing to capture many simple inferences.
Original language | English |
---|---|
Title of host publication | ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Short Papers) |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 650-655 |
Number of pages | 6 |
ISBN (Electronic) | 9781948087346 |
DOIs | |
State | Published - 2018 |
Event | 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 - Melbourne, Australia Duration: 15 Jul 2018 → 20 Jul 2018 |
Publication series
Name | ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) |
---|---|
Volume | 2 |
Conference
Conference | 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 15/07/18 → 20/07/18 |
Bibliographical note
Funding Information:We would like to thank Qian Chen for evaluating KIM on our test set. This work was supported in part by the German Research Foundation through the German-Israeli Project Cooperation (DIP, grant DA 1600/1-1), an Intel ICRI-CI grant, Theo Hoffenberg, and the Israel Science Foundation grants 1951/17 and 1555/15. Vered is also supported by the Clore Scholars Programme (2017), and the AI2 Key Scientific Challenges Program (2017).
Publisher Copyright:
© 2018 Association for Computational Linguistics