Abstract
Factual questions can typically be answered correctly at different levels of granularity. For example, both “August 4, 1961” and “1961” are correct answers to the question “When was Barack Obama born?”. Standard question answering (QA) evaluation protocols, however, do not take this into account explicitly and instead compare a predicted answer against reference answers of a single granularity level. In this work, we propose GRANOLA QA, a novel evaluation setting where a predicted answer is evaluated in terms of accuracy and informativeness against a set of multi-granularity answers. We present a simple methodology for enriching existing datasets with multi-granularity answers, and create GRANOLA-EQ, a multi-granularity version of the ENTITYQUESTIONS dataset. We evaluate models using a range of decoding methods on GRANOLA-EQ, including a new algorithm called Decoding with Response Aggregation (DRAG), that is geared towards aligning the answer granularity with the model's uncertainty. Our experiments show that large language models with standard decoding methods tend to generate specific answers, which are often incorrect. In contrast, when evaluated on multi-granularity answers, DRAG yields a nearly 20 point increase in accuracy on average, which further increases for rare entities, revealing that standard evaluation and decoding schemes may underestimate the knowledge encapsulated in language models.
Original language | English |
---|---|
Title of host publication | Long Papers |
Editors | Lun-Wei Ku, Andre F. T. Martins, Vivek Srikumar |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 6737-6751 |
Number of pages | 15 |
ISBN (Electronic) | 9798891760943 |
State | Published - 2024 |
Externally published | Yes |
Event | 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Bangkok, Thailand Duration: 11 Aug 2024 → 16 Aug 2024 |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
Volume | 1 |
ISSN (Print) | 0736-587X |
Conference
Conference | 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 |
---|---|
Country/Territory | Thailand |
City | Bangkok |
Period | 11/08/24 → 16/08/24 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.