Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Stephen Mayhew, Terra Blevins, Shuheng Liu, Marek Šuppa, Hila Gonen, Joseph Marvin Imperial, Börje F. Karlsson, Peiqin Lin, Nikola Ljubešić, L. J. Miranda, Barbara Plank, Arij Riabi, Yuval Pinter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 19 datasets annotated with named entities in a cross-lingual consistent schema across 13 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We will release the data, code, and fitted models to the public.

Original languageEnglish
Title of host publicationLong Papers
EditorsKevin Duh, Helena Gomez, Steven Bethard
PublisherAssociation for Computational Linguistics (ACL)
Pages4322-4337
Number of pages16
ISBN (Electronic)9798891761148
StatePublished - 2024
Externally publishedYes
Event2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, Mexico
Duration: 16 Jun 202421 Jun 2024

Publication series

NameProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
Volume1

Conference

Conference2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024
Country/TerritoryMexico
CityHybrid, Mexico City
Period16/06/2421/06/24

Bibliographical note

Publisher Copyright:
©2024 Association for Computational Linguistics.

Fingerprint

Dive into the research topics of 'Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark'. Together they form a unique fingerprint.

Cite this