Findings of the 1st Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2023

Francesco Tinner, David Ifeoluwa Adelani, Chris Emezue, Mammad Hajili, Omer Goldman, Muhammad Farid Adilazuarda, Muhammad Dehan Al Kautsar, Aziza Mirsaidova, Müge Kural, Dylan Massey, Chiamaka Chukwuneke, Chinedu Mbonu, Damilola Oluwaseun Oloyede, Kayode Olaleye, Jonathan Atala, Benjamin A. Ajibade, Saksham Bassi, Rahul Aralikatte, Najoung Kim, Duygu Ataman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Large language models (LLMs) excel in language understanding and generation, especially in English which has ample public benchmarks for various natural language processing (NLP) tasks. Nevertheless, their reliability across different languages and domains remains uncertain. Our new shared task introduces a novel benchmark to assess the ability of multilingual LLMs to comprehend and produce language under sparse settings, particularly in scenarios with under-resourced languages, with an emphasis on the ability to capture logical, factual, or causal relationships within lengthy text contexts. The shared task consists of two subtasks crucial to information retrieval: Named Entity Recognition (NER) and Reading Comprehension (RC), in 7 data-scarce languages: Azerbaijani, Igbo, Indonesian, Swiss German, Turkish, Uzbek and Yorùbá, which previously lacked annotated resources in information retrieval tasks. Our evaluation of leading LLMs reveals that, despite their competitive performance, they still have notable weaknesses such as producing output in the non-target language or providing counterfactual information that cannot be inferred from the context. As more advanced models emerge, the benchmark will remain essential for supporting fairness and applicability in information retrieval systems.

Original languageEnglish
Title of host publicationMRL 2023 - 3rd Workshop on Multi-Lingual Representation Learning, Proceedings of the Workshop
EditorsDuygu Ataman
PublisherAssociation for Computational Linguistics (ACL)
Pages106-117
Number of pages12
ISBN (Electronic)9798891760561
StatePublished - 2023
Event3rd Workshop on Multi-lingual Representation Learning, MRL 2023 - Singapore, Singapore
Duration: 7 Dec 2023 → …

Publication series

NameMRL 2023 - 3rd Workshop on Multi-Lingual Representation Learning, Proceedings of the Workshop

Conference

Conference3rd Workshop on Multi-lingual Representation Learning, MRL 2023
Country/TerritorySingapore
CitySingapore
Period7/12/23 → …

Bibliographical note

Publisher Copyright:
© 2023 Association for Computational Linguistics.

Funding

We thank our sponsors Google Deepmind and Bloomberg to make this shared task possible. We also thank HumanSignal for providing us access to Label Studio’s Enterprise version which allowed us execute the large-scale collaboration to perform human annotations in multiple tasks.

Fingerprint

Dive into the research topics of 'Findings of the 1st Shared Task on Multi-lingual Multi-task Information Retrieval at MRL 2023'. Together they form a unique fingerprint.

Cite this