Abstract
We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 19 datasets annotated with named entities in a cross-lingual consistent schema across 13 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We will release the data, code, and fitted models to the public.
Original language | English |
---|---|
Title of host publication | Long Papers |
Editors | Kevin Duh, Helena Gomez, Steven Bethard |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4322-4337 |
Number of pages | 16 |
ISBN (Electronic) | 9798891761148 |
State | Published - 2024 |
Externally published | Yes |
Event | 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, Mexico Duration: 16 Jun 2024 → 21 Jun 2024 |
Publication series
Name | Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 |
---|---|
Volume | 1 |
Conference
Conference | 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 |
---|---|
Country/Territory | Mexico |
City | Hybrid, Mexico City |
Period | 16/06/24 → 21/06/24 |
Bibliographical note
Publisher Copyright:©2024 Association for Computational Linguistics.