Lexical Generalization Improves with Larger Models and Longer Training

Elron Bandel, Yoav Goldberg, Yanai Elazar

Research output: Contribution to conferencePaperpeer-review

6 Scopus citations

Abstract

While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset), and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, we provide evidence that the disparity between models size has its source in the pre-trained model.

Original languageEnglish
Pages4427-4439
Number of pages13
DOIs
StatePublished - 2022
Event2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 7 Dec 202211 Dec 2022

Conference

Conference2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period7/12/2211/12/22

Bibliographical note

Publisher Copyright:
© 2022 Association for Computational Linguistics.

Funding

This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEXTRACT). Yanai Elazar is grateful to have been supported by the PBC fellowship for outstanding PhD candidates in Data Science and the Google PhD fellowship for his PhD, where he spent most of his time on this project. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme, grant agreement No. 802774 (iEX-TRACT). Yanai Elazar is grateful to have been supported by the PBC fellowship for outstanding PhD candidates in Data Science and the Google PhD fellowship for his PhD, where he spent most of his time on this project.

FundersFunder number
Horizon 2020 Framework Programme
European Commission
Horizon 2020802774
Planning and Budgeting Committee of the Council for Higher Education of Israel

    Fingerprint

    Dive into the research topics of 'Lexical Generalization Improves with Larger Models and Longer Training'. Together they form a unique fingerprint.

    Cite this