Abstract
Recognizing entities in texts is a central need in many information-seeking scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the most successful examples of a widely adopted NLP task and corresponding NLP technology. Recent advances in large language models (LLMs) appear to provide effective solutions (also) for NER tasks that were traditionally handled with dedicated models, often matching or surpassing the abilities of the dedicated models. Should NER be considered a solved problem? We argue to the contrary: the capabilities provided by LLMs are not the end of NER research, but rather an exciting beginning. They allow taking NER to the next level, tackling increasingly more useful, and increasingly more challenging, variants. We present three variants of the NER task, together with a dataset to support them. The first is a move towards more fine-grained-and intersectional-entity types. The second is a move towards zero-shot recognition and extraction of these fine-grained types based on entity-type labels. The third, and most challenging, is the move from the recognition setup to a novel retrieval setup, where the query is a zero-shot entity type, and the expected result is all the sentences from a large, pre-indexed corpus that contain entities of these types, and their corresponding spans. We show that all of these are far from being solved. We provide a large, silver-annotated corpus of 4 million paragraphs covering 500 entity types, to facilitate research towards all of these three goals.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics |
Subtitle of host publication | EMNLP 2023 |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 3340-3354 |
Number of pages | 15 |
ISBN (Electronic) | 9798891760615 |
DOIs | |
State | Published - 2023 |
Event | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 - Singapore, Singapore Duration: 6 Dec 2023 → 10 Dec 2023 |
Publication series
Name | Findings of the Association for Computational Linguistics: EMNLP 2023 |
---|
Conference
Conference | 2023 Findings of the Association for Computational Linguistics: EMNLP 2023 |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 6/12/23 → 10/12/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.
Funding
We would like to thank Nicolas Heist of the Cali-graph project for his assistance and feedback. This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT). We would like to thank Nicolas Heist of the Cali-graph project for his assistance and feedback. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEX-TRACT).
Funders | Funder number |
---|---|
Horizon 2020 Framework Programme | |
European Commission | |
Horizon 2020 | 802774 |