Abstract
We propose a new commonsense reasoning benchmark to motivate commonsense reasoning progress from two perspectives: (1) Evaluating whether models can distinguish knowledge quality by predicting if the knowledge is enough to answer the question; (2) Evaluating whether models can develop commonsense inference capabilities that generalize across tasks. We first extract supporting knowledge for each question and ask humans to annotate whether the auto-extracted knowledge is enough to answer the question or not. After that, we convert different tasks into a unified question-answering format to evaluate the models’ generalization capabilities. We name the benchmark Commonsense Inference with Knowledge-in-the-loop Question Answering (CIKQA). Experiments show that with our learning paradigm, models demonstrate encouraging generalization capabilities. At the same time, we also notice that distinguishing knowledge quality remains challenging for current commonsense reasoning models.
Original language | English |
---|---|
Title of host publication | EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 114-124 |
Number of pages | 11 |
ISBN (Electronic) | 9781959429470 |
State | Published - 2023 |
Event | 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 - Dubrovnik, Croatia Duration: 2 May 2023 → 6 May 2023 |
Publication series
Name | EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023 |
---|
Conference
Conference | 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023 - Findings of EACL 2023 |
---|---|
Country/Territory | Croatia |
City | Dubrovnik |
Period | 2/05/23 → 6/05/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
Funding
The authors of this paper were supported by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via IARPA Contract No. 2019-19051600006 under the BETTER Program, and by contract FA8750-19-2-1004 with the US Defense Advanced Research Projects Agency (DARPA). The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government. This paper was also supported by the NSFC Fund (U20B2053) from the NSFC of China, the RIF (R6020-19 and R6021-20) and the GRF (16211520 and 16205322) from RGC of Hong Kong, the MHKJFS (MHP/001/19) from ITC of Hong Kong and the National Key RD Program of China (2019YFE0198200) with special thanks to HKMAAC and CUSBLT, and the Jiangsu Province Science and Technology Collaboration Fund (BZ2021065). We also thank the UGC Research Matching Grants (RMGS20EG01-D, RMGS20CR11, RMGS20CR12, RMGS20EG19, RMGS20EG21, RMGS23CR05, RMGS23EG08). Yanai Elazar is grateful to be supported by the PBC fellowship for outstanding Ph.D. candidates in Data Science and the Google Ph.D. fellowship.
Funders | Funder number |
---|---|
Jiangsu Province Science and Technology Collaboration Fund | BZ2021065 |
NSFC Fund | U20B2053 |
RGC of Hong Kong | MHP/001/19 |
U.S. Department of Defense | |
Defense Advanced Research Projects Agency | |
Glaucoma Research Foundation | 16211520, 16205322 |
Office of the Director of National Intelligence | |
Intelligence Advanced Research Projects Activity | FA8750-19-2-1004, 2019-19051600006 |
National Natural Science Foundation of China | R6021-20, R6020-19 |
University Grants Committee | RMGS20EG19, RMGS23EG08, RMGS23CR05, RMGS20EG21, RMGS20CR11, RMGS20CR12 |
Planning and Budgeting Committee of the Council for Higher Education of Israel | |
National Key Research and Development Program of China | 2019YFE0198200 |