Text-based NP Enrichment

Yanai Elazar, Victoria Basmov, Yoav Goldberg, Reut Tsarfaty

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Understanding the relations between entities denoted by NPs in a text is a critical part of human-like natural language understanding. However, only a fraction of such relations is covered by standard NLP tasks and benchmarks nowadays. In this work, we propose a novel task termed text-based NP enrichment (TNE), in which we aim to enrich each NP in a text with all the preposition-mediated relations—either explicit or implicit—that hold between it and other NPs in the text. The relations are represented as triplets, each denoted by two NPs related via a preposition. Humans recover such relations seamlessly, while current state-of-the-art models struggle with them due to the implicit nature of the problem. We build the first large-scale dataset for the problem, provide the formal framing and scope of annotation, analyze the data, and report the results of fine-tuned language models on the task, demonstrating the challenge it poses to current technology. A webpage with a data-exploration UI, a demo, and links to the code, models, and leaderboard, to foster further research into this challenging problem can be found at: yanaiela.github.io/TNE/.

Original languageEnglish
Pages (from-to)764-784
Number of pages21
JournalTransactions of the Association for Computational Linguistics
Volume10
DOIs
StatePublished - 27 Jul 2022

Bibliographical note

Publisher Copyright:
© MIT Press Journals. All rights reserved.

Funding

We would like to thank the NLP-BIU lab, Nathan Schneider, and Yufang Hou for helpful discussions and comments on this paper. We also thank the anonymous reviewers and the action editors, Marie-Catherine de Marneffe and Mark Steedman, for their valuable suggestions. Yanai Elazar is grateful to be supported by the PBC fellowship for outstanding PhD candidates in Data Science and the Google PhD fellowship. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement no. 802774 (iEXTRACT) and grant agreement no. 677352 (NLPRO).

FundersFunder number
Marie-Catherine de Marneffe and Mark Steedman
Google
Horizon 2020 Framework Programme802774, 677352
European Research Council
Planning and Budgeting Committee of the Council for Higher Education of Israel

    Fingerprint

    Dive into the research topics of 'Text-based NP Enrichment'. Together they form a unique fingerprint.

    Cite this